Skip to main content

Cross-Source Foreign Keys

Discover and manage relationships between tables across different databases and data sources, enabling comprehensive data lineage and impact analysis.

VIDEO TUTORIAL⏱️ 4 mins

📹 Discovering Cross-Database Relationships

Learn how the Foreign Key Discovery Agent identifies and links relationships between dispersed data sources within your organization.

Overview

The Foreign Key Discovery Agent automatically analyzes your data sources to identify potential relationships between tables across different databases. This is particularly valuable for organizations with data distributed across multiple systems like e-commerce platforms, warehouse management systems, CRMs, and ERPs.

How It Works

Automatic Discovery Process

  1. Team-Wide Analysis: The agent examines all data sources within a team
  2. Pattern Recognition: Identifies potential foreign keys based on:
    • Column naming patterns
    • Data type matching
    • Metadata similarities
    • Value overlap analysis
  3. Confidence Scoring: Each discovery is assigned a confidence score
  4. Data Validation: High-confidence matches undergo actual data validation

Discovery Criteria

The agent looks for:

  • Matching Column Names: Similar naming conventions across systems
  • Compatible Data Types: Columns with matching or compatible types
  • Value Overlaps: Actual data values that exist in both columns
  • Business Context: Common patterns like order IDs, customer codes, product SKUs

Viewing Discovered Relationships

Agent Dashboard

The Foreign Key Discovery Agent view displays:

  • Total Discoveries: Number of potential relationships found
  • Confidence Distribution: Breakdown by confidence levels
  • Data Source Coverage: Which systems have relationships

Discovery Table

The discoveries table displays:

ColumnDescription
SourceShows database name, schema.table.column, and data type
TargetShows database name, schema.table.column, and data type
TypeRelationship type badge (one to one, one to many, etc.)
ConfidenceScore percentage with color-coded badge
MethodDiscovery method with icon
Expand IconChevron to expand/collapse row details

Note: Clicking anywhere on a row expands it to show LLM analysis and validation results inline.

Filtering and Sorting

Sort Options

Sortable columns (click column header to toggle):

  • Source: Sort by source column path
  • Confidence: Sort by confidence score (ascending/descending)

Note: Sorting triggers a new data fetch with the selected sort order.

Filter Controls

  • Search: Input field to search columns
  • Confidence Range: Dual slider (0-100%, step: 5%)
  • Discovery Method: Dropdown with options:
    • All methods
    • LLM Name Match
    • Pattern Match
    • Value Overlap
    • Statistical Analysis
    • Manual
  • Relationship Type: Dropdown with options:
    • All types
    • One to One
    • One to Many
    • Many to One
    • Many to Many
  • Apply Filters: Button to apply filter selections
  • Export CSV: Button to export discoveries

Reviewing Discoveries

Expandable Details

Click on any row in the discoveries table to expand inline and view:

LLM Analysis

  • Reasoning: Why the AI believes these columns are related
  • Business Context: Interpretation of the relationship's purpose
  • Confidence Factors: What contributed to the confidence score

Validation Results

When data validation has been performed:

  • Value Overlap: Percentage of matching values
  • Matching Values: Count of values that match
  • Source Distinct Count: Unique values in source column
  • Target Distinct Count: Unique values in target column
  • Sample Matching Values: Up to 5 example matches

Example Discoveries

Common cross-source relationships identified:

  • Order References: Between e-commerce and warehouse systems
  • Customer IDs: Across CRM and billing platforms
  • Product Codes: Between inventory and sales systems
  • Supplier Codes: Linking procurement and accounting

Managing Relationships

Managing Discovered Relationships

Discoveries can be expanded inline in the table by clicking on any row. This shows:

  • LLM Analysis reasoning
  • Validation Results (if available)

To take action on a discovery, you need to open the detail view which provides:

  • Approve button - Creates the relationship in the database
  • Reject button - Dismisses the discovery with optional feedback
  • Feedback button - Toggle to add feedback/rejection reason

Manual Relationship Definition

Create relationships manually using the wizard:

Step 1: Select Source

  1. Click Define Relationship button
  2. Choose source data source
  3. Select source table
  4. Pick source column

Step 2: Select Target

  1. Choose target data source
  2. Select target table
  3. Pick target column

Step 3: Define Relationship

  1. Select relationship type:
    • One to Many (default)
    • One to One
    • Many to One
    • Many to Many
  2. Enter relationship name
  3. Add description (optional)

Step 4: Review & Create

  1. Review all selections
  2. Verify source and target details
  3. Confirm relationship configuration
  4. Click Create button

Database Constraints View

Created relationships appear in the Database Constraints section where you can:

  • View All Relationships: Both discovered and manually defined
  • Edit Relationships: Modify existing relationships
  • Delete Relationships: Remove incorrect or outdated links
  • Export Relationships: Download as CSV for documentation

Agent Settings

Configure the Foreign Key Discovery Agent behavior:

Configuration Options

SettingDescriptionDefault
Confidence ThresholdMinimum confidence score (slider)60%
Include Low ConfidenceShow discoveries below thresholdOff
Cross Database OnlyOnly show cross-database relationshipsOn
Compare with All Team SourcesAnalyze all team data sourcesOn
Max Column PairsMaximum pairs to analyze10,000
Max Tables per SourceTables limit per data source100
Max Columns per TableColumn limit per table50
Max Sample SizeRows to sample for validation10,000
Validate with DataPerform data validationOn
Enable Pattern MatchingUse pattern matching algorithmOn
Enable LLM AnalysisUse AI for analysisOn
Operation TimeoutMaximum processing time (seconds)300

Performance Tuning

  • Large Databases: Increase time limits and sampling size
  • Many Data Sources: Limit column pairs to focus on likely matches
  • High Accuracy Needs: Raise minimum confidence threshold

Best Practices

Regular Discovery Runs

  • Schedule agent runs after adding new data sources
  • Re-run after schema changes
  • Periodic validation of existing relationships

Review Process

  1. Start with high-confidence discoveries (80%+)
  2. Validate business logic of relationships
  3. Document accepted relationships
  4. Review low-confidence items for hidden connections

Data Governance

  • Document why relationships were accepted or rejected
  • Maintain relationship metadata
  • Regular audits of cross-source dependencies

Use Cases

Data Lineage

  • Track data flow across systems
  • Understand downstream impacts
  • Document system dependencies

Impact Analysis

  • Assess changes across connected systems
  • Plan migrations with full context
  • Identify affected downstream processes

Business Intelligence

  • Enable cross-system reporting
  • Build unified data models
  • Create comprehensive dashboards

Data Quality

  • Identify inconsistencies across systems
  • Validate referential integrity
  • Monitor data synchronization

Troubleshooting

Common Issues

IssueCauseSolution
No discoveries foundLimited column name matchesAdjust minimum confidence threshold
Too many false positivesGeneric column namesIncrease confidence threshold
Missing obvious relationshipsDifferent naming conventionsUse manual definition
Slow discovery processLarge databasesAdjust time limits and sampling

API Integration

Programmatically manage cross-source foreign keys:

  • Query discovered relationships
  • Accept or reject discoveries
  • Create manual relationships
  • Export relationship metadata