Medallion Pipeline Skill

A GitHub Copilot skill extension that makes Copilot an expert data engineer for building end-to-end Python data pipelines following the Medallion architecture (Source → Bronze → Silver → Gold).

What it does

Once installed, Copilot automatically applies expert data engineering knowledge when you ask it to:

Build ingestion pipelines (Source → Bronze)
Add data quality and transformations (Bronze → Silver)
Create curated, business-ready datasets (Silver → Gold)

Usage

Just describe your pipeline in plain English in GitHub Copilot Chat:

"Build me a Bronze ingestion pipeline from a REST API source using pandas"

"Add a deduplication step for my Silver layer customer table with SCD Type 2"

"Create a Gold layer aggregation for monthly sales by region"

Copilot will use the skill to apply expert patterns including:

Schema enforcement and data quality rules
Idempotent batch loading with watermarks
PII masking and GDPR compliance patterns
Delta Lake MERGE for upserts/SCDs
Z-ORDER optimization and broadcast joins
Quality gates at every layer boundary

Technology

This skill is technology-agnostic. Mention your preferred stack (pandas, PySpark, dbt, Snowflake, BigQuery, etc.) and Copilot will use it. If you don't specify, it defaults to PySpark with Delta Lake.

Included files

File	Purpose
`SKILL.md`	Core skill definition and architecture principles
`references/source-to-bronze.md`	Bronze layer ingestion patterns
`references/bronze-to-silver.md`	Silver layer quality and transformation patterns
`references/silver-to-gold.md`	Gold layer aggregation and dimensional modeling
`assets/pipeline_template.py`	Production-ready pipeline scaffold

Before publishing to the Marketplace

Update package.json with:

"publisher": your registered Marketplace publisher ID
"repository": your actual repo URL
"version": follow semver (1.0.0, 1.1.0, etc.)

Medallion Pipeline Skill

GCID Data & AI

Medallion Pipeline Skill

What it does

Usage

Technology

Included files

Before publishing to the Marketplace