Skip to content
| Marketplace
Sign in
Visual Studio Code>Azure>Azure Synapse/Fabric Lineage VisualizerNew to Visual Studio Code? Get it now.
Azure Synapse/Fabric Lineage Visualizer

Azure Synapse/Fabric Lineage Visualizer

Shas Vaddi

| (0) | Free
Trace data lineage across Synapse/Fabric pipelines, Spark notebooks, SQL pools, and Power BI datasets. Interactive DAG with column-level lineage and click-to-navigate.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Azure Synapse / Fabric Lineage Visualizer

Trace data lineage across Synapse & Fabric pipelines, Spark notebooks, SQL pools, and Power BI datasets — rendered as an interactive DAG right inside VS Code.

Version VS Code License


Features

Capability Description
Pipeline Parser Reads Synapse / Fabric pipeline JSON — Copy, Notebook, DataFlow, SQL SP activities with ForEach / If / Switch nesting
SQL Parser Extracts lineage from CREATE VIEW, CTAS, INSERT INTO … SELECT, standalone SELECT, column-level edges
Spark / PySpark Parser Tracks DataFrame chains (spark.read → transforms → .write), handles .ipynb notebooks, .join(), .select(), .withColumn()
Power BI Parser Parses .bim / .pbidataset tabular models — tables, measures, DAX dependencies, M query upstream, relationships
Interactive DAG SVG-based directed graph with pan, zoom, search, edge highlighting, and click-to-navigate
Column-Level Lineage Drill into any column and trace its upstream transformations across all layers
Export Save the full lineage graph as structured JSON for reports or downstream tools

Quick Start

  1. Install the extension (VSIX or Marketplace).
  2. Open a workspace containing Synapse / Fabric pipeline JSON, SQL scripts, PySpark files, or Power BI models.
  3. Right-click any supported file → "Trace Data Lineage from File".
  4. The interactive DAG opens beside your editor — click any node to jump to its source.

Supported File Types

Extension Content
.json Synapse / Fabric pipeline definitions (must contain an activities array)
.sql SQL pool scripts (DDL / DML)
.py PySpark scripts
.ipynb Jupyter / Synapse notebooks
.bim Tabular model (Analysis Services / Fabric Semantic Model)
.pbidataset Power BI dataset definition

Commands

Command Palette Label Description
azureLineage.traceFromFile Trace Data Lineage from File Parse the active file and display its lineage DAG
azureLineage.traceWorkspace Trace Data Lineage — Full Workspace Scan all supported files in the workspace
azureLineage.traceColumn Trace Column-Level Lineage Enter a column name and trace its upstream path
azureLineage.exportLineage Export Lineage as JSON Save the lineage graph to a JSON file

All commands are available from the Command Palette (Ctrl+Shift+P) and via right-click context menus on files.


DAG Interaction

  • Hover a node → tooltip with qualified name, metadata, column preview
  • Click a node → jump to the source file and line; connected edges highlight
  • Columns button → side panel showing all columns and column-level edges
  • Search → filter nodes by name or qualified path
  • Zoom → mouse wheel; Pan → click + drag on background
  • Export → copy full lineage JSON to clipboard

Node Colors

Color Kind
🟢 Green Source (external tables, files)
🔵 Blue Pipeline
🟣 Purple Activity
🟠 Orange Spark Transform
🔴 Red SQL View / Table
🟡 Yellow Power BI Dataset / Measure
⚪ Grey Sink / Unknown

Settings

Setting Default Description
azureLineage.maxDepth 10 Maximum parsing depth for nested pipelines
azureLineage.showColumnLineage true Show column-level edges in the DAG
azureLineage.dagLayout left-to-right Layout direction (left-to-right or top-to-bottom)
azureLineage.highlightColor #0078D4 Accent color for highlighted edges

Example Walkthrough

1. Pipeline JSON

{
  "name": "IngestCustomers",
  "properties": {
    "activities": [
      {
        "name": "CopyFromBlob",
        "type": "Copy",
        "inputs": [{ "referenceName": "BlobCustomersCSV" }],
        "outputs": [{ "referenceName": "SqlPoolCustomers" }],
        "typeProperties": {
          "source": { "type": "DelimitedTextSource" },
          "sink": { "type": "SqlDWSink" },
          "translator": {
            "type": "TabularTranslator",
            "columnMappings": {
              "name": "CustomerName",
              "email": "EmailAddress"
            }
          }
        }
      }
    ]
  }
}

Result: BlobCustomersCSV → CopyFromBlob → SqlPoolCustomers with column edges name→CustomerName, email→EmailAddress.

2. SQL Script

CREATE VIEW dbo.ActiveCustomers AS
SELECT c.CustomerName, o.Total
FROM dbo.Customers c
JOIN dbo.Orders o ON c.Id = o.CustomerId
WHERE o.Status = 'Active';

Result: dbo.Customers and dbo.Orders feed into dbo.ActiveCustomers, with column-level edges.

3. PySpark Notebook

df_raw = spark.read.parquet("/data/raw/events")
df_clean = df_raw.select("user_id", "event_type", "ts") \
                  .withColumn("event_date", to_date("ts"))
df_clean.write.saveAsTable("curated.events_clean")

Result: /data/raw/events → spark_transform → curated.events_clean, columns tracked through .select and .withColumn.

4. Power BI Dataset

A .bim file with tables, calculated columns referencing 'Sales'[Amount], and M queries pulling from Sql.Database("myserver", "mydb") — all rendered as upstream sources flowing into the dataset.


Exported JSON Schema

{
  "nodes": [
    {
      "id": "node_1",
      "label": "Customers",
      "kind": "sqltable",
      "qualifiedName": "dbo.Customers",
      "columns": [{ "name": "Id" }, { "name": "CustomerName" }],
      "fileUri": "/workspace/sql/tables.sql",
      "fileLine": 12,
      "metadata": {}
    }
  ],
  "edges": [
    {
      "sourceId": "node_1",
      "targetId": "node_3",
      "columnEdges": [
        { "sourceColumn": "CustomerName", "targetColumn": "CustomerName" }
      ],
      "relationship": "SELECT"
    }
  ],
  "tracedAt": "2024-06-15T10:30:00.000Z",
  "rootFiles": ["tables.sql", "views.sql"]
}

Requirements

  • VS Code 1.85.0 or later
  • Files must be locally available (no remote filesystem support yet)

Known Limitations

  • SQL parsing uses pattern matching, not a full SQL grammar — complex CTEs or dynamic SQL may be partially traced.
  • Spark parsing follows dfMap variable assignments; reassigned variables in loops may lose lineage.
  • Power BI M queries only detect a subset of data sources (SQL, SharePoint, CSV, Excel, Azure Blob).

License

MIT


Publisher: shasvaddi
Repository: GitHub

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft