Snippington PySpark for Databricks

Visual Studio Code extension for PySpark code snippets optimized for Databricks environments.

This works with ipynb (Jupyter Notebook) files and Python files in Visual Studio Code.

This extension provides comprehensive snippets for data engineering and analytics workflows in Databricks using PySpark.

Installation

Install the extension from the VS Code marketplace
Restart VS Code
Start using the snippets in your Python or Jupyter Notebook files

Basic Usage

Type the prefix (e.g., dbs: read-delta) and press Tab to insert the snippet. Use Tab to navigate through the placeholders.

Snippet Categories

Session Management & Basic Operations

Prefix	Description
`dbs: imp`	Import PySpark essentials
`dbs: spark`	Get the existing Spark session in Databricks
`dbs: display`	Display a DataFrame in Databricks notebook
`dbs: schema`	Print the schema of a DataFrame
`dbs: show`	Show the first N rows of a DataFrame
`dbs: shape`	Get the shape of a DataFrame
`dbs: explain`	Show the execution plan for a DataFrame operation
`dbs: set-config`	Set and get Spark configuration parameters

I/O Operations

Prefix	Description
`dbs: read-csv`	Read CSV file into a Spark DataFrame
`dbs: read-csv-schema`	Read CSV file with explicit schema definition
`dbs: read-parquet`	Read Parquet file into a Spark DataFrame
`dbs: read-delta`	Read Delta table into a Spark DataFrame
`dbs: read-jdbc`	Read data from JDBC source
`dbs: write-csv`	Write DataFrame to CSV file
`dbs: write-parquet`	Write DataFrame to Parquet file
`dbs: write-delta`	Write DataFrame to Delta table
`dbs: write-jdbc`	Write DataFrame to JDBC destination

DataFrame Creation & Manipulation

Prefix	Description
`dbs: create-df`	Create a DataFrame from in-memory data
`dbs: create-df-schema`	Create a DataFrame with explicit schema definition
`dbs: select`	Select specific columns from DataFrame
`dbs: select-expr`	Select columns with SQL expressions
`dbs: filter`	Filter DataFrame based on condition
`dbs: filter-multiple`	Filter DataFrame based on multiple conditions
`dbs: filter-sql`	Filter DataFrame using SQL expression
`dbs: sort`	Sort DataFrame by columns
`dbs: join`	Join two DataFrames
`dbs: union`	Union two DataFrames preserving column names
`dbs: add-column`	Add a new column to DataFrame
`dbs: add-column-expr`	Add a new column using SQL expression
`dbs: rename-column`	Rename a column in DataFrame
`dbs: drop-column`	Drop columns from DataFrame
`dbs: cast`	Cast a column to a different data type
`dbs: cache`	Cache DataFrame in memory
`dbs: persist`	Persist DataFrame with specified storage level
`dbs: unpersist`	Remove DataFrame from cache
`dbs: repartition`	Repartition DataFrame to specified number of partitions
`dbs: coalesce`	Reduce number of partitions without full shuffle
`dbs: broadcast`	Use broadcast join for performance with small DataFrames
`dbs: sample`	Take a random sample from DataFrame
`dbs: partition-info`	Get information about DataFrame partitions
`dbs: compare-schemas`	Compare schemas of two DataFrames

Data Transformation

Prefix	Description
`dbs: fillna`	Replace null values with specified values
`dbs: dropna`	Drop rows with null values in specified columns
`dbs: groupby`	Group by columns and calculate aggregates
`dbs: pivot`	Create a pivot table
`dbs: window`	Apply window function for running calculations
`dbs: udf`	Define and apply a User Defined Function (UDF)
`dbs: pandas-udf`	Define and apply a Pandas User Defined Function for vectorized operations

SQL Operations

Prefix	Description
`dbs: temp-view`	Create a temporary view for SQL queries
`dbs: sql`	Run SQL query on registered views

Data Types & Functions

Prefix	Description
`dbs: string-funcs`	Common string manipulation functions in PySpark
`dbs: datetime-funcs`	Common date and time functions in PySpark
`dbs: array-funcs`	Functions for working with array type columns in PySpark
`dbs: map-funcs`	Functions for working with map type columns in PySpark
`dbs: json-funcs`	Functions for working with JSON data in PySpark
`dbs: analytics-funcs`	Analytics and window functions in PySpark

Delta Lake Operations

Prefix	Description
`dbs: delta-create`	Create a Delta table from DataFrame or SQL
`dbs: delta-timetravel`	Time travel query to previous versions of a Delta table
`dbs: delta-history`	View the history of operations on a Delta table
`dbs: delta-vacuum`	Remove files that are no longer in the latest version of the Delta table
`dbs: delta-merge`	Perform MERGE operation (upsert) on a Delta table
`dbs: schema-evolution`	Enable schema evolution for Delta tables

Databricks-Specific Features

Prefix	Description
`dbs: autoloader`	Use Databricks AutoLoader to ingest files from cloud storage
`dbs: unity-catalog`	Work with Unity Catalog in Databricks
`dbs: external-table`	Create and query external tables in Databricks
`dbs: copy-into`	Use COPY INTO command for idempotent data ingestion
`dbs: mlflow-track`	Track ML experiments with MLflow

Streaming Operations

Prefix	Description
`dbs: stream-query`	Create and process streaming data with Structured Streaming
`dbs: stream-foreach-batch`	Process streaming data in batches with foreachBatch

Machine Learning

Prefix	Description
`dbs: ml-pipeline`	Create ML pipeline with feature preprocessing and model training
`dbs: mlflow-track`	Track ML experiments with MLflow

Pre-requisites

Make sure you have the following set up:

Access to a Databricks workspace
PySpark environment (automatically provided in Databricks)
Delta Lake (included in Databricks Runtime)

Example Usage

# Get the Spark session
dbs: spark

# Read data from Delta table
dbs: read-delta

# Perform group by and aggregation
dbs: groupby

# Write results to Delta format
dbs: write-delta

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

PySpark Databricks Snippets

Snippington