Visual Studio Code extension for PySpark code snippets optimized for Databricks environments.
This works with ipynb (Jupyter Notebook) files and Python files in Visual Studio Code.
This extension provides comprehensive snippets for data engineering and analytics workflows in Databricks using PySpark.
Installation
Install the extension from the VS Code marketplace
Restart VS Code
Start using the snippets in your Python or Jupyter Notebook files
Basic Usage
Type the prefix (e.g., dbs: read-delta) and press Tab to insert the snippet. Use Tab to navigate through the placeholders.
Snippet Categories
Session Management & Basic Operations
Prefix
Description
dbs: imp
Import PySpark essentials
dbs: spark
Get the existing Spark session in Databricks
dbs: display
Display a DataFrame in Databricks notebook
dbs: schema
Print the schema of a DataFrame
dbs: show
Show the first N rows of a DataFrame
dbs: shape
Get the shape of a DataFrame
dbs: explain
Show the execution plan for a DataFrame operation
dbs: set-config
Set and get Spark configuration parameters
I/O Operations
Prefix
Description
dbs: read-csv
Read CSV file into a Spark DataFrame
dbs: read-csv-schema
Read CSV file with explicit schema definition
dbs: read-parquet
Read Parquet file into a Spark DataFrame
dbs: read-delta
Read Delta table into a Spark DataFrame
dbs: read-jdbc
Read data from JDBC source
dbs: write-csv
Write DataFrame to CSV file
dbs: write-parquet
Write DataFrame to Parquet file
dbs: write-delta
Write DataFrame to Delta table
dbs: write-jdbc
Write DataFrame to JDBC destination
DataFrame Creation & Manipulation
Prefix
Description
dbs: create-df
Create a DataFrame from in-memory data
dbs: create-df-schema
Create a DataFrame with explicit schema definition
dbs: select
Select specific columns from DataFrame
dbs: select-expr
Select columns with SQL expressions
dbs: filter
Filter DataFrame based on condition
dbs: filter-multiple
Filter DataFrame based on multiple conditions
dbs: filter-sql
Filter DataFrame using SQL expression
dbs: sort
Sort DataFrame by columns
dbs: join
Join two DataFrames
dbs: union
Union two DataFrames preserving column names
dbs: add-column
Add a new column to DataFrame
dbs: add-column-expr
Add a new column using SQL expression
dbs: rename-column
Rename a column in DataFrame
dbs: drop-column
Drop columns from DataFrame
dbs: cast
Cast a column to a different data type
dbs: cache
Cache DataFrame in memory
dbs: persist
Persist DataFrame with specified storage level
dbs: unpersist
Remove DataFrame from cache
dbs: repartition
Repartition DataFrame to specified number of partitions
dbs: coalesce
Reduce number of partitions without full shuffle
dbs: broadcast
Use broadcast join for performance with small DataFrames
dbs: sample
Take a random sample from DataFrame
dbs: partition-info
Get information about DataFrame partitions
dbs: compare-schemas
Compare schemas of two DataFrames
Data Transformation
Prefix
Description
dbs: fillna
Replace null values with specified values
dbs: dropna
Drop rows with null values in specified columns
dbs: groupby
Group by columns and calculate aggregates
dbs: pivot
Create a pivot table
dbs: window
Apply window function for running calculations
dbs: udf
Define and apply a User Defined Function (UDF)
dbs: pandas-udf
Define and apply a Pandas User Defined Function for vectorized operations
SQL Operations
Prefix
Description
dbs: temp-view
Create a temporary view for SQL queries
dbs: sql
Run SQL query on registered views
Data Types & Functions
Prefix
Description
dbs: string-funcs
Common string manipulation functions in PySpark
dbs: datetime-funcs
Common date and time functions in PySpark
dbs: array-funcs
Functions for working with array type columns in PySpark
dbs: map-funcs
Functions for working with map type columns in PySpark
dbs: json-funcs
Functions for working with JSON data in PySpark
dbs: analytics-funcs
Analytics and window functions in PySpark
Delta Lake Operations
Prefix
Description
dbs: delta-create
Create a Delta table from DataFrame or SQL
dbs: delta-timetravel
Time travel query to previous versions of a Delta table
dbs: delta-history
View the history of operations on a Delta table
dbs: delta-vacuum
Remove files that are no longer in the latest version of the Delta table
dbs: delta-merge
Perform MERGE operation (upsert) on a Delta table
dbs: schema-evolution
Enable schema evolution for Delta tables
Databricks-Specific Features
Prefix
Description
dbs: autoloader
Use Databricks AutoLoader to ingest files from cloud storage
dbs: unity-catalog
Work with Unity Catalog in Databricks
dbs: external-table
Create and query external tables in Databricks
dbs: copy-into
Use COPY INTO command for idempotent data ingestion
dbs: mlflow-track
Track ML experiments with MLflow
Streaming Operations
Prefix
Description
dbs: stream-query
Create and process streaming data with Structured Streaming
dbs: stream-foreach-batch
Process streaming data in batches with foreachBatch
Machine Learning
Prefix
Description
dbs: ml-pipeline
Create ML pipeline with feature preprocessing and model training
dbs: mlflow-track
Track ML experiments with MLflow
Pre-requisites
Make sure you have the following set up:
Access to a Databricks workspace
PySpark environment (automatically provided in Databricks)
Delta Lake (included in Databricks Runtime)
Example Usage
# Get the Spark session
dbs: spark
# Read data from Delta table
dbs: read-delta
# Perform group by and aggregation
dbs: groupby
# Write results to Delta format
dbs: write-delta
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.