Cross-Source Foreign Keys
Discover and manage relationships between tables across different databases and data sources, enabling comprehensive data lineage and impact analysis.
📹 Discovering Cross-Database Relationships
Learn how the Foreign Key Discovery Agent identifies and links relationships between dispersed data sources within your organization.
Overview
The Foreign Key Discovery Agent automatically analyzes your data sources to identify potential relationships between tables across different databases. This is particularly valuable for organizations with data distributed across multiple systems like e-commerce platforms, warehouse management systems, CRMs, and ERPs.
How It Works
Automatic Discovery Process
- Team-Wide Analysis: The agent examines all data sources within a team
- Pattern Recognition: Identifies potential foreign keys based on:
- Column naming patterns
- Data type matching
- Metadata similarities
- Value overlap analysis
- Confidence Scoring: Each discovery is assigned a confidence score
- Data Validation: High-confidence matches undergo actual data validation
Discovery Criteria
The agent looks for:
- Matching Column Names: Similar naming conventions across systems
- Compatible Data Types: Columns with matching or compatible types
- Value Overlaps: Actual data values that exist in both columns
- Business Context: Common patterns like order IDs, customer codes, product SKUs
Viewing Discovered Relationships
Agent Dashboard
The Foreign Key Discovery Agent view displays:
- Total Discoveries: Number of potential relationships found
- Confidence Distribution: Breakdown by confidence levels
- Data Source Coverage: Which systems have relationships
Discovery Table
The discoveries table displays:
Column | Description |
---|---|
Source | Shows database name, schema.table.column, and data type |
Target | Shows database name, schema.table.column, and data type |
Type | Relationship type badge (one to one, one to many, etc.) |
Confidence | Score percentage with color-coded badge |
Method | Discovery method with icon |
Expand Icon | Chevron to expand/collapse row details |
Note: Clicking anywhere on a row expands it to show LLM analysis and validation results inline.
Filtering and Sorting
Sort Options
Sortable columns (click column header to toggle):
- Source: Sort by source column path
- Confidence: Sort by confidence score (ascending/descending)
Note: Sorting triggers a new data fetch with the selected sort order.
Filter Controls
- Search: Input field to search columns
- Confidence Range: Dual slider (0-100%, step: 5%)
- Discovery Method: Dropdown with options:
- All methods
- LLM Name Match
- Pattern Match
- Value Overlap
- Statistical Analysis
- Manual
- Relationship Type: Dropdown with options:
- All types
- One to One
- One to Many
- Many to One
- Many to Many
- Apply Filters: Button to apply filter selections
- Export CSV: Button to export discoveries
Reviewing Discoveries
Expandable Details
Click on any row in the discoveries table to expand inline and view:
LLM Analysis
- Reasoning: Why the AI believes these columns are related
- Business Context: Interpretation of the relationship's purpose
- Confidence Factors: What contributed to the confidence score
Validation Results
When data validation has been performed:
- Value Overlap: Percentage of matching values
- Matching Values: Count of values that match
- Source Distinct Count: Unique values in source column
- Target Distinct Count: Unique values in target column
- Sample Matching Values: Up to 5 example matches
Example Discoveries
Common cross-source relationships identified:
- Order References: Between e-commerce and warehouse systems
- Customer IDs: Across CRM and billing platforms
- Product Codes: Between inventory and sales systems
- Supplier Codes: Linking procurement and accounting
Managing Relationships
Managing Discovered Relationships
Discoveries can be expanded inline in the table by clicking on any row. This shows:
- LLM Analysis reasoning
- Validation Results (if available)
To take action on a discovery, you need to open the detail view which provides:
- Approve button - Creates the relationship in the database
- Reject button - Dismisses the discovery with optional feedback
- Feedback button - Toggle to add feedback/rejection reason
Manual Relationship Definition
Create relationships manually using the wizard:
Step 1: Select Source
- Click Define Relationship button
- Choose source data source
- Select source table
- Pick source column
Step 2: Select Target
- Choose target data source
- Select target table
- Pick target column
Step 3: Define Relationship
- Select relationship type:
- One to Many (default)
- One to One
- Many to One
- Many to Many
- Enter relationship name
- Add description (optional)
Step 4: Review & Create
- Review all selections
- Verify source and target details
- Confirm relationship configuration
- Click Create button
Database Constraints View
Created relationships appear in the Database Constraints section where you can:
- View All Relationships: Both discovered and manually defined
- Edit Relationships: Modify existing relationships
- Delete Relationships: Remove incorrect or outdated links
- Export Relationships: Download as CSV for documentation
Agent Settings
Configure the Foreign Key Discovery Agent behavior:
Configuration Options
Setting | Description | Default |
---|---|---|
Confidence Threshold | Minimum confidence score (slider) | 60% |
Include Low Confidence | Show discoveries below threshold | Off |
Cross Database Only | Only show cross-database relationships | On |
Compare with All Team Sources | Analyze all team data sources | On |
Max Column Pairs | Maximum pairs to analyze | 10,000 |
Max Tables per Source | Tables limit per data source | 100 |
Max Columns per Table | Column limit per table | 50 |
Max Sample Size | Rows to sample for validation | 10,000 |
Validate with Data | Perform data validation | On |
Enable Pattern Matching | Use pattern matching algorithm | On |
Enable LLM Analysis | Use AI for analysis | On |
Operation Timeout | Maximum processing time (seconds) | 300 |
Performance Tuning
- Large Databases: Increase time limits and sampling size
- Many Data Sources: Limit column pairs to focus on likely matches
- High Accuracy Needs: Raise minimum confidence threshold
Best Practices
Regular Discovery Runs
- Schedule agent runs after adding new data sources
- Re-run after schema changes
- Periodic validation of existing relationships
Review Process
- Start with high-confidence discoveries (80%+)
- Validate business logic of relationships
- Document accepted relationships
- Review low-confidence items for hidden connections
Data Governance
- Document why relationships were accepted or rejected
- Maintain relationship metadata
- Regular audits of cross-source dependencies
Use Cases
Data Lineage
- Track data flow across systems
- Understand downstream impacts
- Document system dependencies
Impact Analysis
- Assess changes across connected systems
- Plan migrations with full context
- Identify affected downstream processes
Business Intelligence
- Enable cross-system reporting
- Build unified data models
- Create comprehensive dashboards
Data Quality
- Identify inconsistencies across systems
- Validate referential integrity
- Monitor data synchronization
Troubleshooting
Common Issues
Issue | Cause | Solution |
---|---|---|
No discoveries found | Limited column name matches | Adjust minimum confidence threshold |
Too many false positives | Generic column names | Increase confidence threshold |
Missing obvious relationships | Different naming conventions | Use manual definition |
Slow discovery process | Large databases | Adjust time limits and sampling |
API Integration
Programmatically manage cross-source foreign keys:
- Query discovered relationships
- Accept or reject discoveries
- Create manual relationships
- Export relationship metadata