Data Validator for VS Code
An AI-powered linter, live verifier, and auto-remediator for your entire engineering ecosystem. Built with pluggable rule packs, this extension acts as a universal validation platform for Airflow DAGs, dbt models, SQL migrations, CI/CD pipelines, Kubernetes manifests, and more. It catches structural errors before they are deployed, verifies external references, and suggests one-click AI fixes right in your editor.
Key Features
The core engine supports hot-swappable Rule Packs allowing it to validate multiple ecosystems natively inside VS Code:
- Apache Airflow DAGs: Detects circular dependencies (>>), unresolved imports, and deprecated variables (e.g., schedule_interval).
- dbt Models (Coming Soon): Validate model references, macros, and schema configurations.
- SQL Migrations: Finds duplicate columns and syntax errors in inline queries or standalone scripts using sqlglot.
- Infrastructure (Coming Soon): CI/CD (GitHub Actions) and Kubernetes manifest linting.
2. Live Target Verification (Data Warehouses)
Extracts source tables and columns from the DAG's SQL queries and verifies them against your live databases.
- Supported Warehouses: Doris, BigQuery, Snowflake.
- Fuzzy Matching: If you misspell a column name, the extension suggests the closest match from the live warehouse.
- Airflow API: Connects to your Airflow REST API to check if referenced connection IDs actually exist in the target environment.
Powered by Google Gemini (default) or Anthropic Claude.
- Gated Fixes: When you encounter an error, click "Fix with AI" in the validator dashboard. The AI will stream code fix variants (low, medium, high impact).
- Deterministic Safety: The extension runs the proposed AI changes through the static-check engine in the background. If the AI introduces new errors, the fix is gated to prevent you from applying broken code.
Installation
- Install the extension from the VS Code Marketplace.
- Ensure you have Python 3.10+ installed on your system.
- Install the required Python backend engine packages in your active environment:
pip install sqlglot google-generativeai pymysql google-cloud-bigquery snowflake-connector-python
Configuration
Setting your AI Provider (Gemini / Claude)
The extension securely stores API keys in your OS keychain.
- Open the Command Palette (
Ctrl+Shift+P or Cmd+Shift+P).
- Run
AI Validator: Set API Key.
- Select your provider (
gemini or claude) and paste your API key.
Configuring Live Data Connections
To enable live checks against your warehouse or Airflow instance, configure your connections in the VS Code settings.
- Go to Settings (
Ctrl+,).
- Search for
AI Validator Connections.
- Add a connection profile (e.g., Doris, Snowflake, or Airflow).
- Run
AI Validator: Set Connection Password from the Command Palette to securely save the password/token for that connection profile.
How to Use
- Open a DAG: Simply open any Python file containing an Airflow DAG.
- View Lints: You will immediately see inline squiggles for syntax errors, circular dependencies, or deprecated calls. Hover over them for details.
- Open Dashboard: Run
AI Validator: Open Validator Panel from the Command Palette to see a comprehensive overview of all passed and failed checks for your current DAG.
- Apply Fixes: Click Fix with AI in the panel to stream structural code fixes directly into the side-by-side diff viewer. You can accept the changes with a single click.
Security & Privacy
- Local First: All static analysis, DAG parsing, and engine rules run entirely locally on your machine via a spawned Python process.
- Secure Credentials: All passwords and API keys are stored securely in the VS Code
SecretStorage API (backed by Windows Credential Manager, macOS Keychain, or Linux Secret Service).
- AI Redaction: Before sending code to the AI provider, the engine automatically redacts sensitive connection tokens and passwords.
| |