Medallion Pipeline Skill
A GitHub Copilot skill extension that makes Copilot an expert data engineer for building end-to-end Python data pipelines following the Medallion architecture (Source → Bronze → Silver → Gold).
What it does
Once installed, Copilot automatically applies expert data engineering knowledge when you ask it to:
- Build ingestion pipelines (Source → Bronze)
- Add data quality and transformations (Bronze → Silver)
- Create curated, business-ready datasets (Silver → Gold)
Usage
Just describe your pipeline in plain English in GitHub Copilot Chat:
"Build me a Bronze ingestion pipeline from a REST API source using pandas"
"Add a deduplication step for my Silver layer customer table with SCD Type 2"
"Create a Gold layer aggregation for monthly sales by region"
Copilot will use the skill to apply expert patterns including:
- Schema enforcement and data quality rules
- Idempotent batch loading with watermarks
- PII masking and GDPR compliance patterns
- Delta Lake MERGE for upserts/SCDs
- Z-ORDER optimization and broadcast joins
- Quality gates at every layer boundary
Technology
This skill is technology-agnostic. Mention your preferred stack (pandas, PySpark, dbt, Snowflake, BigQuery, etc.) and Copilot will use it. If you don't specify, it defaults to PySpark with Delta Lake.
Included files
| File |
Purpose |
SKILL.md |
Core skill definition and architecture principles |
references/source-to-bronze.md |
Bronze layer ingestion patterns |
references/bronze-to-silver.md |
Silver layer quality and transformation patterns |
references/silver-to-gold.md |
Gold layer aggregation and dimensional modeling |
assets/pipeline_template.py |
Production-ready pipeline scaffold |
Before publishing to the Marketplace
Update package.json with:
"publisher": your registered Marketplace publisher ID
"repository": your actual repo URL
"version": follow semver (1.0.0, 1.1.0, etc.)