Azure Synapse / Fabric Lineage Visualizer
Trace data lineage across Synapse & Fabric pipelines, Spark notebooks, SQL pools, and Power BI datasets — rendered as an interactive DAG right inside VS Code.

Features
| Capability |
Description |
| Pipeline Parser |
Reads Synapse / Fabric pipeline JSON — Copy, Notebook, DataFlow, SQL SP activities with ForEach / If / Switch nesting |
| SQL Parser |
Extracts lineage from CREATE VIEW, CTAS, INSERT INTO … SELECT, standalone SELECT, column-level edges |
| Spark / PySpark Parser |
Tracks DataFrame chains (spark.read → transforms → .write), handles .ipynb notebooks, .join(), .select(), .withColumn() |
| Power BI Parser |
Parses .bim / .pbidataset tabular models — tables, measures, DAX dependencies, M query upstream, relationships |
| Interactive DAG |
SVG-based directed graph with pan, zoom, search, edge highlighting, and click-to-navigate |
| Column-Level Lineage |
Drill into any column and trace its upstream transformations across all layers |
| Export |
Save the full lineage graph as structured JSON for reports or downstream tools |
Quick Start
- Install the extension (VSIX or Marketplace).
- Open a workspace containing Synapse / Fabric pipeline JSON, SQL scripts, PySpark files, or Power BI models.
- Right-click any supported file → "Trace Data Lineage from File".
- The interactive DAG opens beside your editor — click any node to jump to its source.
Supported File Types
| Extension |
Content |
.json |
Synapse / Fabric pipeline definitions (must contain an activities array) |
.sql |
SQL pool scripts (DDL / DML) |
.py |
PySpark scripts |
.ipynb |
Jupyter / Synapse notebooks |
.bim |
Tabular model (Analysis Services / Fabric Semantic Model) |
.pbidataset |
Power BI dataset definition |
Commands
| Command |
Palette Label |
Description |
azureLineage.traceFromFile |
Trace Data Lineage from File |
Parse the active file and display its lineage DAG |
azureLineage.traceWorkspace |
Trace Data Lineage — Full Workspace |
Scan all supported files in the workspace |
azureLineage.traceColumn |
Trace Column-Level Lineage |
Enter a column name and trace its upstream path |
azureLineage.exportLineage |
Export Lineage as JSON |
Save the lineage graph to a JSON file |
All commands are available from the Command Palette (Ctrl+Shift+P) and via right-click context menus on files.
DAG Interaction
- Hover a node → tooltip with qualified name, metadata, column preview
- Click a node → jump to the source file and line; connected edges highlight
- Columns button → side panel showing all columns and column-level edges
- Search → filter nodes by name or qualified path
- Zoom → mouse wheel; Pan → click + drag on background
- Export → copy full lineage JSON to clipboard
Node Colors
| Color |
Kind |
| 🟢 Green |
Source (external tables, files) |
| 🔵 Blue |
Pipeline |
| 🟣 Purple |
Activity |
| 🟠 Orange |
Spark Transform |
| 🔴 Red |
SQL View / Table |
| 🟡 Yellow |
Power BI Dataset / Measure |
| ⚪ Grey |
Sink / Unknown |
Settings
| Setting |
Default |
Description |
azureLineage.maxDepth |
10 |
Maximum parsing depth for nested pipelines |
azureLineage.showColumnLineage |
true |
Show column-level edges in the DAG |
azureLineage.dagLayout |
left-to-right |
Layout direction (left-to-right or top-to-bottom) |
azureLineage.highlightColor |
#0078D4 |
Accent color for highlighted edges |
Example Walkthrough
1. Pipeline JSON
{
"name": "IngestCustomers",
"properties": {
"activities": [
{
"name": "CopyFromBlob",
"type": "Copy",
"inputs": [{ "referenceName": "BlobCustomersCSV" }],
"outputs": [{ "referenceName": "SqlPoolCustomers" }],
"typeProperties": {
"source": { "type": "DelimitedTextSource" },
"sink": { "type": "SqlDWSink" },
"translator": {
"type": "TabularTranslator",
"columnMappings": {
"name": "CustomerName",
"email": "EmailAddress"
}
}
}
}
]
}
}
Result: BlobCustomersCSV → CopyFromBlob → SqlPoolCustomers with column edges name→CustomerName, email→EmailAddress.
2. SQL Script
CREATE VIEW dbo.ActiveCustomers AS
SELECT c.CustomerName, o.Total
FROM dbo.Customers c
JOIN dbo.Orders o ON c.Id = o.CustomerId
WHERE o.Status = 'Active';
Result: dbo.Customers and dbo.Orders feed into dbo.ActiveCustomers, with column-level edges.
3. PySpark Notebook
df_raw = spark.read.parquet("/data/raw/events")
df_clean = df_raw.select("user_id", "event_type", "ts") \
.withColumn("event_date", to_date("ts"))
df_clean.write.saveAsTable("curated.events_clean")
Result: /data/raw/events → spark_transform → curated.events_clean, columns tracked through .select and .withColumn.
4. Power BI Dataset
A .bim file with tables, calculated columns referencing 'Sales'[Amount], and M queries pulling from Sql.Database("myserver", "mydb") — all rendered as upstream sources flowing into the dataset.
Exported JSON Schema
{
"nodes": [
{
"id": "node_1",
"label": "Customers",
"kind": "sqltable",
"qualifiedName": "dbo.Customers",
"columns": [{ "name": "Id" }, { "name": "CustomerName" }],
"fileUri": "/workspace/sql/tables.sql",
"fileLine": 12,
"metadata": {}
}
],
"edges": [
{
"sourceId": "node_1",
"targetId": "node_3",
"columnEdges": [
{ "sourceColumn": "CustomerName", "targetColumn": "CustomerName" }
],
"relationship": "SELECT"
}
],
"tracedAt": "2024-06-15T10:30:00.000Z",
"rootFiles": ["tables.sql", "views.sql"]
}
Requirements
- VS Code 1.85.0 or later
- Files must be locally available (no remote filesystem support yet)
Known Limitations
- SQL parsing uses pattern matching, not a full SQL grammar — complex CTEs or dynamic SQL may be partially traced.
- Spark parsing follows
dfMap variable assignments; reassigned variables in loops may lose lineage.
- Power BI M queries only detect a subset of data sources (SQL, SharePoint, CSV, Excel, Azure Blob).
License
MIT
Publisher: shasvaddi
Repository: GitHub