🚀 DataOps Copilot
AI-powered DataOps Control Center for VS Code.


DataOps Copilot brings Snowflake, Databricks, and Airflow into one operational cockpit inside VS Code, then layers Gemini intelligence on top for optimization, observability, and decision support.
Marketplace Quick Start
Install in under 2 minutes:
- Install DataOps Copilot from VS Code Marketplace.
- Open Command Palette and run DataOps: Configure Gemini API Key.
- Run DataOps: Add Connection and add Snowflake, Databricks, or Airflow.
- Run DataOps: Switch Active Connection.
- Open a SQL file and run DataOps: Run Active SQL Query.
Why Install DataOps Copilot
- One extension for Snowflake, Databricks, and Airflow workflows.
- AI SQL generation, optimization, and cost prediction.
- Airflow DAG monitoring, trigger, and failure-log analysis.
- Databricks resource insights with AI recommendations.
- Secure credential handling using VS Code secret storage.
🖼️ Hero Section
Data teams do not need another isolated query runner. They need one control surface for execution, monitoring, orchestration, and AI-guided improvements.
DataOps Copilot is built for that exact workflow.
┌──────────────────────────────────────────────────────────────┐
│ DataOps Copilot │
│ Snowflake • Databricks • Airflow • Gemini │
│ │
│ Execute SQL • Monitor Resources • Trigger DAGs • Optimize │
└──────────────────────────────────────────────────────────────┘
💡 Tip
Add one connection per platform and switch context from the status bar for a smooth multi-platform workflow.
🧠 What is DataOps Copilot?
DataOps Copilot is a production-grade VS Code extension designed for engineers who operate data platforms, not just write SQL.
It solves three common pain points:
- Tool fragmentation across Snowflake, Databricks, and Airflow.
- Lack of proactive insight before query/resource mistakes happen.
- Slow context switching between execution and observability.
Unlike standard platform-specific extensions, DataOps Copilot combines:
- Multi-platform control in one sidebar.
- AI-guided optimization and advisories.
- Operational actions (for example: trigger Airflow DAG runs) directly from context.
✨ Features
AI SQL Intelligence
- AI Query Optimizer with actionable rewrites and replace-in-editor flow.
- AI Query Generator to convert natural language into SQL.
- Query Cost Predictor with rule-based analysis and optional AI augmentation.
- Structured output for issues, suggestions, and recommendations.
Snowflake Integration
- Secure connection management.
- Metadata explorer: databases, schemas, tables.
- SQL execution from active editor.
- Table preview with rich result webview.
- Query history integration.
Databricks Control Center
- Compute view with
SQL Warehouses, Clusters, and Apps.
- Monitor clusters (state, workers, autoscale).
- Monitor jobs and run outcomes.
- Monitor SQL warehouses.
- List Databricks Apps from workspace APIs.
- Browse query history.
- Explore Unity Catalog metadata (catalogs, schemas, tables).
- Execute SQL statements with warehouse resolution.
- AI-backed resource advisor in details panel.
Airflow Integration
- DAG monitoring with lazy loading.
- DAG run history.
- Task instance tracking (state, tries, duration).
- Trigger DAG execution directly from tree/context.
- DAG details panel with AI pipeline advisor.
- AI log analysis for DAG runs (error-focused when failures exist, summary-focused otherwise).
- Manual refresh workflow for user-controlled updates.
- One Connections view for Snowflake, Databricks, and Airflow.
- Global refresh and platform-specific refresh actions.
- User-controlled refresh flow (no forced Airflow auto-polling).
- Active connection indicator in status bar.
- Smooth command-palette driven workflows.
🔍 Highlight
DataOps Copilot is an intelligence-first extension: it does not only execute commands, it helps you choose better actions before you run them.
⚔️ Comparison: Why It Wins
| Feature |
Snowflake VS Code Extension |
Databricks VS Code Extension |
DataOps Copilot |
| Snowflake SQL execution |
✅ |
❌ |
✅ |
| Databricks SQL execution |
❌ |
✅ |
✅ |
| Airflow DAG monitoring + trigger |
❌ |
❌ |
✅ |
| Cross-platform connections in one view |
❌ |
❌ |
✅ |
| AI query optimization |
Limited/No |
Limited/No |
✅ |
| AI cost prediction |
❌ |
❌ |
✅ |
| AI resource advisor |
❌ |
Limited |
✅ |
| Unified operational observability |
❌ |
Partial |
✅ |
| Control-center style workflow |
❌ |
❌ |
✅ |
🧠 Why This Project is Different (USP)
1. Intelligence over pure execution
Most tools stop at "run command". DataOps Copilot adds analysis before and after execution.
2. AI-driven insights built into operations
Optimization, risk prediction, and advisory outcomes are part of the normal workflow, not an afterthought.
3. Observability + optimization in one loop
You can inspect platform health, run workload actions, and apply AI recommendations without leaving VS Code.
🏗️ Architecture Diagram
graph TD
U[Developer in VS Code] --> V[DataOps Copilot Extension]
V --> C[Connection Manager]
V --> P[Providers and Commands]
P --> S1[Snowflake Service]
P --> S2[Databricks Services]
P --> S3[Airflow Service]
P --> AI[Gemini AI Services]
S1 --> Snowflake[(Snowflake)]
S2 --> Databricks[(Databricks API and SQL)]
S3 --> Airflow[(Airflow REST API)]
AI --> Gemini[(Gemini)]
🔄 Data Flow Diagram
flowchart LR
A[User Action] --> B[Query or Resource Request]
B --> C[AI Analysis: Optimize or Predict]
C --> D[Execution Engine]
D --> E[Platform Response]
E --> F[Webview Insights and History]
D --> D1[Snowflake]
D --> D2[Databricks]
D --> D3[Airflow Trigger or Monitoring]
📂 Project Structure
src/
commands/
addConnectionCommand.ts
runQueryCommand.ts
triggerDAGCommand.ts
showAirflowDagDetailsCommand.ts
showDatabricksDetailsCommand.ts
providers/
connectionsTreeDataProvider.ts
databricksTreeProvider.ts
airflowTreeProvider.ts
historyTreeProvider.ts
services/
snowflakeService.ts
databricksApiClient.ts
databricksAppsService.ts
databricksSqlService.ts
databricksClusterService.ts
databricksJobsService.ts
databricksWarehouseService.ts
databricksMetadataService.ts
databricksQueryHistoryService.ts
airflowService.ts
aiProvider.ts
aiOptimizerService.ts
aiQueryGeneratorService.ts
aiCostEstimatorService.ts
geminiAdvisorService.ts
geminiAirflowAdvisor.ts
utils/
webviewTableRenderer.ts
airflowDagDetailsWebview.ts
databricksDetailsWebview.ts
models/
extension.ts
resources/
dataops.svg
⚙️ Installation and Setup
1. Clone
git clone https://github.com/Nikh9123/DataOps-Copilot.git
cd DataOps-Copilot
2. Install dependencies
npm install
3. Build
npm run compile
4. Launch extension host
- Open the project in VS Code.
- Press F5.
5. Create VSIX package (for direct use)
Build a distributable VS Code extension package:
npm run package:vsix
This creates a file named dataops-copilot.vsix in the project root.
6. Install extension from the VSIX file
Install directly from CLI:
npm run install:vsix
Or install from VS Code UI:
- Open Extensions panel.
- Click the
... menu.
- Select
Install from VSIX....
- Choose
dataops-copilot.vsix.
7. Install from VS Code Marketplace
- Open VS Code.
- Go to Extensions (
Ctrl+Shift+X).
- Search for
DataOps Copilot.
- Click
Install.
- If prompted, click
Trust Publisher for NikhilSatyam.
- Reload VS Code after installation.
✅ After Install (User Checklist)
Requirements
- VS Code
1.90.0 or later.
- Network access to your target platforms:
- Snowflake account URL.
- Databricks workspace URL and API access.
- Airflow base URL with API enabled (
/api/v1).
- At least one AI key:
- Gemini API key, or
- OpenAI API key.
First-Run Setup
- Open Command Palette (
Ctrl+Shift+P).
- Run
DataOps: Configure Gemini API Key if you want Gemini.
- Run
DataOps: Add Connection and create Snowflake, Databricks, and/or Airflow connections.
- Run
DataOps: Switch Active Connection to choose the current working target.
- Open a
.sql file and run DataOps: Run Active SQL Query.
Where Credentials Are Stored
- Passwords/tokens/API keys are stored in VS Code secure secret storage.
- Connection metadata (non-secret fields such as host, name, type) is stored in extension global state.
- No secrets are written to query history entries.
- Run SQL for Snowflake and Databricks.
- Preview table data.
- Generate and optimize SQL with AI.
- Predict query cost and review warnings.
- Monitor and trigger Airflow DAGs.
- Analyze Airflow DAG failures with AI log analysis.
Command Reference
DataOps: Configure Gemini API Key
DataOps: Add Connection
DataOps: Remove Connection
DataOps: Switch Active Connection
DataOps: Run Active SQL Query
DataOps: Preview Table
DataOps: Generate SQL
DataOps: Optimize Query
DataOps: Predict Query Cost
DataOps: Show Databricks Details
DataOps: Show Airflow DAG Details
DataOps: Trigger DAG
DataOps: Analyze DAG Failure with AI
DataOps: Refresh All Connections Data
DataOps: Refresh Databricks Services
DataOps: Refresh Airflow
DataOps: Clear Query History
Known Setup Tips
- For Databricks SQL execution, providing
warehouseId during connection setup is recommended.
- For Airflow token auth, choose
Bearer Token while adding a connection.
- If AI commands show provider setup errors, configure Gemini key from command palette or set AI env vars.
Troubleshooting
- Extension installed but no view appears:
- Open Activity Bar and click
DataOps Copilot.
- Run
DataOps: Refresh All Connections Data.
- AI command fails with provider/key error:
- Run
DataOps: Configure Gemini API Key and reload window.
- Or set
DATAOPS_AI_PROVIDER and relevant API key in .env.
- Query execution fails:
- Confirm active connection is correct.
- Re-enter credentials using remove/add connection flow.
- Validate account/workspace host and permissions.
- Airflow fetch errors:
- Confirm Airflow URL, credentials, and API accessibility.
- Ensure required DAG permissions exist for current user.
Uninstall and Cleanup
- Remove the extension from Extensions view.
- Optional cleanup:
- Remove connection entries with
DataOps: Remove Connection before uninstall.
- If needed, clear VS Code secret storage entries for the extension.
🔑 Configuration
Create a local .env file in project root.
You can also configure Gemini from VS Code without editing .env:
- Open Command Palette.
- Run
DataOps: Configure Gemini API Key.
- Paste your key when prompted.
- Reload window when prompted.
AI Provider
DATAOPS_AI_PROVIDER=gemini
DATAOPS_GEMINI_API_KEY=YOUR_GEMINI_API_KEY
DATAOPS_GEMINI_MODEL=gemini-3-flash-preview
Or:
DATAOPS_AI_PROVIDER=openai
DATAOPS_OPENAI_API_KEY=YOUR_OPENAI_API_KEY
DATAOPS_OPENAI_MODEL=gpt-4o-mini
Optional Cost Hints
DATAOPS_LARGE_TABLES=FACT_ORDERS,EVENTS,RAW_CLICKSTREAM
- Snowflake: account, username, password.
- Databricks: workspace host, username, PAT, optional warehouse ID.
- Airflow: host/url, auth mode (basic or bearer), credentials.
🚀 Usage Guide
Run SQL
- Set active connection to Snowflake or Databricks.
- Open a
.sql file.
- Execute
DataOps: Run Active SQL Query or press Ctrl+Enter.
Optimize SQL
- Select or open SQL text.
- Run
DataOps: Optimize Query.
- Review suggestions and replace query when needed.
Predict Cost
- Open SQL query.
- Run
DataOps: Predict Query Cost.
- Inspect cost level, risks, and recommendations.
Monitor Databricks Resources
- Expand Databricks connection in Connections view.
- Open
Compute for SQL Warehouses, Clusters, and Apps.
- Open Jobs, Query History, and Catalogs.
- Click resource nodes for detailed insight panels.
Analyze Airflow DAG Logs with AI
- Expand Airflow connection and open a DAG or DAG run.
- Run
DataOps: Analyze DAG Failure with AI.
- If errors exist, review root cause, task-level errors, suggested fixes, and next steps.
- If no errors, review concise run-log summary and validation steps.
Refresh Data Manually
- Use
DataOps: Refresh Connections for full refresh.
- Use
DataOps: Refresh Databricks Services for Databricks-only refresh.
- Use
DataOps: Refresh Airflow for Airflow-only refresh.
Trigger Airflow DAG
- Expand Airflow connection.
- Open DAG node context menu.
- Run
DataOps: Trigger DAG and confirm.
🧠 AI Features Explained
Query Generator
- Converts natural-language prompts into executable SQL.
- Helps analysts move from intent to query faster.
- Works with the active data platform context.
Query Optimizer
- Detects inefficient SQL patterns.
- Suggests safer and faster alternatives.
- Supports direct replace in editor.
Cost Predictor
- Combines heuristics and optional AI scoring.
- Flags high-risk patterns such as broad scans and unbounded queries.
Resource Advisor
- Databricks advisor analyzes cluster/job/warehouse/app signals.
- Summarizes app health (running/stopped/failed) and possible failure reasons.
- Returns concise issues and optimization recommendations.
Pipeline Advisor
- Airflow advisor analyzes DAG schedule, runs, and task behavior.
- Surfaces bottlenecks, reliability concerns, and practical next steps.
DAG Log Analyzer
- Analyzes Airflow DAG task logs with AI.
- If failures exist, returns root cause, task-level errors, fixes, and next steps.
- If no failures exist, returns a concise execution summary and validation guidance.
📸 Screenshots
Connections Explorer
Unified sidebar with Snowflake, Databricks, and Airflow connections, including Databricks compute grouping and Airflow DAG hierarchy.

Query Optimizer
AI SQL optimization report showing detected issues, recommended fixes, and an optimized replacement query.

Query Cost Predictor
Cost risk analysis panel with scan/cost level, issues, and execution suggestions before running SQL.

Query Cost Warning Dialog
Pre-execution warning prompt that lets you review risks and choose whether to continue query execution.

Query Results
Rich query result webview with metrics, AI warnings, and CSV export.

Snowflake Table Preview
Table preview experience for Snowflake objects with result grid and performance hints.

Databricks SQL and Catalog Experience
Databricks browsing and SQL/table preview workflow from the unified explorer.

Databricks Warehouse Details with AI Insights
Warehouse state/capacity details with AI-generated issues, suggestions, and recommendation.

Databricks Job AI Helper
Job run details panel with AI-assisted troubleshooting and optimization guidance.

Databricks App AI Summary
App details panel with AI insights for app status (running/stopped/failed), possible failure reason, and practical improvement suggestions.

Airflow DAG Details
DAG overview page with run history and task status visibility for operational debugging.

Airflow DAG Tree View
Expanded DAG/run/task hierarchy from the connections explorer for quick operational navigation.

📌 Note
Screenshots are stored in the assets folder and referenced directly in this README.
🛠️ Tech Stack
- TypeScript
- VS Code Extension API
- Snowflake SDK (
snowflake-sdk)
- Databricks REST and SQL Statement APIs
- Apache Airflow REST API (
/api/v1)
- Gemini / OpenAI provider abstraction
- Axios and dotenv
🔮 Future Enhancements
- Data lineage graph and dependency explorer.
- Unified cost dashboard across platforms.
- Auto-fix SQL mode with confidence scoring.
- Policy-aware governance checks before execution.
- Expanded observability timelines for runs/jobs/tasks.
🤝 Contributing
Contributions are welcome.
- Fork the repository.
- Create a feature branch.
- Commit with clear messages.
- Open a pull request with context and screenshots if UI changes are included.
Suggested local checks:
npm run compile
npm run lint
📜 License
MIT License.
⭐ Support This Project
If DataOps Copilot helps your team ship better data workflows:
- Star the repository.
- Share it with your data engineering team.
- Open issues for feature requests and platform integrations.
Built to make DataOps faster, smarter, and more reliable from inside your editor.