Spark Pipeline Visualizer (SDP)
Visualize Apache Spark Declarative Pipelines as interactive DAGs directly in VS Code.
✔ Instantly understand complex Spark pipeline dependencies
✔ Visualize pipeline DAGs from Python, and SQL files
✔ Interactive dependency graph for Spark declarative pipelines
✔ Click DAG nodes to open source code directly in the editor
✔ Preview SQL and Python snippets with syntax highlighting
✔ Dark mode 🥷
✔ Horizontal and vertival flow

⚡ Getting Started (30 seconds)
- Open a workspace containing a Spark pipeline (
pipeline.yml, .py, or .sql)
- Click the Spark Pipeline Visualizer icon in the Activity Bar
- Select a pipeline from the automatically detected files
- Explore the pipeline DAG and entity details in the sidebar and webview
🧩 Supported Entity Definitions
The extension detects Spark entities defined using the @dlt., @dp. or @sdp. decorator syntax in Python files.
Entities are identified by extracting the name parameter from the decorator and resolving dependencies from referenced SQL queries.
Supported Decorators
| Decorator |
Entity Type |
@dp.table(name="...") |
Table |
@dp.view(name="...") |
View |
@dp.materialized_view(name="...") |
Materialized View |
@dp.temporary_view(name="...") |
Temporary View |
@dp.streaming_table(name="...") |
Streaming Table |
Example
from pyspark.sql import SparkSession
from pyspark import pipelines as dp # or sdp
@dp.materialized_view(name="sales_summary")
def create_sales_summary(spark: SparkSession):
return spark.sql("""
SELECT region, SUM(amount) AS total
FROM raw_sales
GROUP BY region
""")
@dp.table(name="customers_enriched")
def enrich_customers(spark: SparkSession):
return spark.sql("""
SELECT c.*, o.order_count
FROM raw_customers c
LEFT JOIN order_counts o ON c.id = o.customer_id
""")