Azure Synapse / Fabric Lineage Visualizer

Trace data lineage across Synapse & Fabric pipelines, Spark notebooks, SQL pools, and Power BI datasets — rendered as an interactive DAG right inside VS Code.

Features

Capability	Description
Pipeline Parser	Reads Synapse / Fabric pipeline JSON — Copy, Notebook, DataFlow, SQL SP activities with ForEach / If / Switch nesting
SQL Parser	Extracts lineage from `CREATE VIEW`, `CTAS`, `INSERT INTO … SELECT`, standalone `SELECT`, column-level edges
Spark / PySpark Parser	Tracks DataFrame chains (`spark.read` → transforms → `.write`), handles `.ipynb` notebooks, `.join()`, `.select()`, `.withColumn()`
Power BI Parser	Parses `.bim` / `.pbidataset` tabular models — tables, measures, DAX dependencies, M query upstream, relationships
Interactive DAG	SVG-based directed graph with pan, zoom, search, edge highlighting, and click-to-navigate
Column-Level Lineage	Drill into any column and trace its upstream transformations across all layers
Export	Save the full lineage graph as structured JSON for reports or downstream tools

Quick Start

Install the extension (VSIX or Marketplace).
Open a workspace containing Synapse / Fabric pipeline JSON, SQL scripts, PySpark files, or Power BI models.
Right-click any supported file → "Trace Data Lineage from File".
The interactive DAG opens beside your editor — click any node to jump to its source.

Supported File Types

Extension	Content
`.json`	Synapse / Fabric pipeline definitions (must contain an `activities` array)
`.sql`	SQL pool scripts (DDL / DML)
`.py`	PySpark scripts
`.ipynb`	Jupyter / Synapse notebooks
`.bim`	Tabular model (Analysis Services / Fabric Semantic Model)
`.pbidataset`	Power BI dataset definition

Commands

Command	Palette Label	Description
`azureLineage.traceFromFile`	Trace Data Lineage from File	Parse the active file and display its lineage DAG
`azureLineage.traceWorkspace`	Trace Data Lineage — Full Workspace	Scan all supported files in the workspace
`azureLineage.traceColumn`	Trace Column-Level Lineage	Enter a column name and trace its upstream path
`azureLineage.exportLineage`	Export Lineage as JSON	Save the lineage graph to a JSON file

All commands are available from the Command Palette (Ctrl+Shift+P) and via right-click context menus on files.

DAG Interaction

Hover a node → tooltip with qualified name, metadata, column preview
Click a node → jump to the source file and line; connected edges highlight
Columns button → side panel showing all columns and column-level edges
Search → filter nodes by name or qualified path
Zoom → mouse wheel; Pan → click + drag on background
Export → copy full lineage JSON to clipboard

Node Colors

Color	Kind
🟢 Green	Source (external tables, files)
🔵 Blue	Pipeline
🟣 Purple	Activity
🟠 Orange	Spark Transform
🔴 Red	SQL View / Table
🟡 Yellow	Power BI Dataset / Measure
⚪ Grey	Sink / Unknown

Settings

Setting	Default	Description
`azureLineage.maxDepth`	`10`	Maximum parsing depth for nested pipelines
`azureLineage.showColumnLineage`	`true`	Show column-level edges in the DAG
`azureLineage.dagLayout`	`left-to-right`	Layout direction (`left-to-right` or `top-to-bottom`)
`azureLineage.highlightColor`	`#0078D4`	Accent color for highlighted edges

Example Walkthrough

1. Pipeline JSON

{
  "name": "IngestCustomers",
  "properties": {
    "activities": [
      {
        "name": "CopyFromBlob",
        "type": "Copy",
        "inputs": [{ "referenceName": "BlobCustomersCSV" }],
        "outputs": [{ "referenceName": "SqlPoolCustomers" }],
        "typeProperties": {
          "source": { "type": "DelimitedTextSource" },
          "sink": { "type": "SqlDWSink" },
          "translator": {
            "type": "TabularTranslator",
            "columnMappings": {
              "name": "CustomerName",
              "email": "EmailAddress"
            }
          }
        }
      }
    ]
  }
}

Result: BlobCustomersCSV → CopyFromBlob → SqlPoolCustomers with column edges name→CustomerName, email→EmailAddress.

2. SQL Script

CREATE VIEW dbo.ActiveCustomers AS
SELECT c.CustomerName, o.Total
FROM dbo.Customers c
JOIN dbo.Orders o ON c.Id = o.CustomerId
WHERE o.Status = 'Active';

Result: dbo.Customers and dbo.Orders feed into dbo.ActiveCustomers, with column-level edges.

3. PySpark Notebook

df_raw = spark.read.parquet("/data/raw/events")
df_clean = df_raw.select("user_id", "event_type", "ts") \
                  .withColumn("event_date", to_date("ts"))
df_clean.write.saveAsTable("curated.events_clean")

Result: /data/raw/events → spark_transform → curated.events_clean, columns tracked through .select and .withColumn.

4. Power BI Dataset

A .bim file with tables, calculated columns referencing 'Sales'[Amount], and M queries pulling from Sql.Database("myserver", "mydb") — all rendered as upstream sources flowing into the dataset.

Exported JSON Schema

{
  "nodes": [
    {
      "id": "node_1",
      "label": "Customers",
      "kind": "sqltable",
      "qualifiedName": "dbo.Customers",
      "columns": [{ "name": "Id" }, { "name": "CustomerName" }],
      "fileUri": "/workspace/sql/tables.sql",
      "fileLine": 12,
      "metadata": {}
    }
  ],
  "edges": [
    {
      "sourceId": "node_1",
      "targetId": "node_3",
      "columnEdges": [
        { "sourceColumn": "CustomerName", "targetColumn": "CustomerName" }
      ],
      "relationship": "SELECT"
    }
  ],
  "tracedAt": "2024-06-15T10:30:00.000Z",
  "rootFiles": ["tables.sql", "views.sql"]
}

Requirements

VS Code 1.85.0 or later
Files must be locally available (no remote filesystem support yet)

Known Limitations

SQL parsing uses pattern matching, not a full SQL grammar — complex CTEs or dynamic SQL may be partially traced.
Spark parsing follows dfMap variable assignments; reassigned variables in loops may lose lineage.
Power BI M queries only detect a subset of data sources (SQL, SharePoint, CSV, Excel, Azure Blob).

License

MIT

Publisher: shasvaddi
Repository: GitHub

Azure Synapse/Fabric Lineage Visualizer

Shas Vaddi