Cross-Source Foreign Keys

Discover and manage relationships between tables across different databases and data sources, enabling comprehensive data lineage and impact analysis.

VIDEO TUTORIAL⏱️ 4 mins

📹 Discovering Cross-Database Relationships

Learn how the Foreign Key Discovery Agent identifies and links relationships between dispersed data sources within your organization.

Overview

The Foreign Key Discovery Agent automatically analyzes your data sources to identify potential relationships between tables across different databases. This is particularly valuable for organizations with data distributed across multiple systems like e-commerce platforms, warehouse management systems, CRMs, and ERPs.

How It Works

Automatic Discovery Process

Team-Wide Analysis: The agent examines all data sources within a team
Pattern Recognition: Identifies potential foreign keys based on:
- Column naming patterns
- Data type matching
- Metadata similarities
- Value overlap analysis
Confidence Scoring: Each discovery is assigned a confidence score
Data Validation: High-confidence matches undergo actual data validation

Discovery Criteria

The agent looks for:

Matching Column Names: Similar naming conventions across systems
Compatible Data Types: Columns with matching or compatible types
Value Overlaps: Actual data values that exist in both columns
Business Context: Common patterns like order IDs, customer codes, product SKUs

Viewing Discovered Relationships

Agent Dashboard

The Foreign Key Discovery Agent view displays:

Total Discoveries: Number of potential relationships found
Confidence Distribution: Breakdown by confidence levels
Data Source Coverage: Which systems have relationships

Discovery Table

The discoveries table displays:

Column	Description
Source	Shows database name, schema.table.column, and data type
Target	Shows database name, schema.table.column, and data type
Type	Relationship type badge (one to one, one to many, etc.)
Confidence	Score percentage with color-coded badge
Method	Discovery method with icon
Expand Icon	Chevron to expand/collapse row details

Note: Clicking anywhere on a row expands it to show LLM analysis and validation results inline.

Filtering and Sorting

Sort Options

Sortable columns (click column header to toggle):

Source: Sort by source column path
Confidence: Sort by confidence score (ascending/descending)

Note: Sorting triggers a new data fetch with the selected sort order.

Filter Controls

Search: Input field to search columns
Confidence Range: Dual slider (0-100%, step: 5%)
Discovery Method: Dropdown with options:
- All methods
- LLM Name Match
- Pattern Match
- Value Overlap
- Statistical Analysis
- Manual
Relationship Type: Dropdown with options:
- All types
- One to One
- One to Many
- Many to One
- Many to Many
Apply Filters: Button to apply filter selections
Export CSV: Button to export discoveries

Reviewing Discoveries

Expandable Details

Click on any row in the discoveries table to expand inline and view:

LLM Analysis

Reasoning: Why the AI believes these columns are related
Business Context: Interpretation of the relationship's purpose
Confidence Factors: What contributed to the confidence score

Validation Results

When data validation has been performed:

Value Overlap: Percentage of matching values
Matching Values: Count of values that match
Source Distinct Count: Unique values in source column
Target Distinct Count: Unique values in target column
Sample Matching Values: Up to 5 example matches

Example Discoveries

Common cross-source relationships identified:

Order References: Between e-commerce and warehouse systems
Customer IDs: Across CRM and billing platforms
Product Codes: Between inventory and sales systems
Supplier Codes: Linking procurement and accounting

Managing Relationships

Managing Discovered Relationships

Discoveries can be expanded inline in the table by clicking on any row. This shows:

LLM Analysis reasoning
Validation Results (if available)

To take action on a discovery, you need to open the detail view which provides:

Approve button - Creates the relationship in the database
Reject button - Dismisses the discovery with optional feedback
Feedback button - Toggle to add feedback/rejection reason

Manual Relationship Definition

Create relationships manually using the wizard:

Step 1: Select Source

Click Define Relationship button
Choose source data source
Select source table
Pick source column

Step 2: Select Target

Choose target data source
Select target table
Pick target column

Step 3: Define Relationship

Select relationship type:
- One to Many (default)
- One to One
- Many to One
- Many to Many
Enter relationship name
Add description (optional)

Step 4: Review & Create

Review all selections
Verify source and target details
Confirm relationship configuration
Click Create button

Database Constraints View

Created relationships appear in the Database Constraints section where you can:

View All Relationships: Both discovered and manually defined
Edit Relationships: Modify existing relationships
Delete Relationships: Remove incorrect or outdated links
Export Relationships: Download as CSV for documentation

Agent Settings

Configure the Foreign Key Discovery Agent behavior:

Configuration Options

Setting	Description	Default
Confidence Threshold	Minimum confidence score (slider)	60%
Include Low Confidence	Show discoveries below threshold	Off
Cross Database Only	Only show cross-database relationships	On
Compare with All Team Sources	Analyze all team data sources	On
Max Column Pairs	Maximum pairs to analyze	10,000
Max Tables per Source	Tables limit per data source	100
Max Columns per Table	Column limit per table	50
Max Sample Size	Rows to sample for validation	10,000
Validate with Data	Perform data validation	On
Enable Pattern Matching	Use pattern matching algorithm	On
Enable LLM Analysis	Use AI for analysis	On
Operation Timeout	Maximum processing time (seconds)	300

Performance Tuning

Large Databases: Increase time limits and sampling size
Many Data Sources: Limit column pairs to focus on likely matches
High Accuracy Needs: Raise minimum confidence threshold

Best Practices

Regular Discovery Runs

Schedule agent runs after adding new data sources
Re-run after schema changes
Periodic validation of existing relationships

Review Process

Start with high-confidence discoveries (80%+)
Validate business logic of relationships
Document accepted relationships
Review low-confidence items for hidden connections

Data Governance

Document why relationships were accepted or rejected
Maintain relationship metadata
Regular audits of cross-source dependencies

Use Cases

Data Lineage

Track data flow across systems
Understand downstream impacts
Document system dependencies

Impact Analysis

Assess changes across connected systems
Plan migrations with full context
Identify affected downstream processes

Business Intelligence

Enable cross-system reporting
Build unified data models
Create comprehensive dashboards

Data Quality

Identify inconsistencies across systems
Validate referential integrity
Monitor data synchronization

Troubleshooting

Common Issues

Issue	Cause	Solution
No discoveries found	Limited column name matches	Adjust minimum confidence threshold
Too many false positives	Generic column names	Increase confidence threshold
Missing obvious relationships	Different naming conventions	Use manual definition
Slow discovery process	Large databases	Adjust time limits and sampling

API Integration

Programmatically manage cross-source foreign keys:

Query discovered relationships
Accept or reject discoveries
Create manual relationships
Export relationship metadata

📹 Discovering Cross-Database Relationships

Overview​

How It Works​

Automatic Discovery Process​

Discovery Criteria​

Viewing Discovered Relationships​

Agent Dashboard​

Discovery Table​

Filtering and Sorting​

Sort Options​

Filter Controls​

Reviewing Discoveries​

Expandable Details​

LLM Analysis​

Validation Results​

Example Discoveries​

Managing Relationships​

Managing Discovered Relationships​

Manual Relationship Definition​

Step 1: Select Source​

Step 2: Select Target​

Step 3: Define Relationship​

Step 4: Review & Create​

Database Constraints View​

Agent Settings​

Configuration Options​

Performance Tuning​

Best Practices​

Regular Discovery Runs​

Review Process​

Data Governance​

Use Cases​

Data Lineage​

Impact Analysis​

Business Intelligence​

Data Quality​

Troubleshooting​

Common Issues​

API Integration​

Related Topics​