AI Code Assistance — General Coding + Data + Data Science + AI Engineering

A Vibe Coding assistant for VS Code that goes far beyond simple chat. AI Code Assistance turns your editor into a specialist across four full areas:

General coding — explain, refactor, generate, fix code in any language
Data analytics — inventory datasets, frame business questions, drill into root causes, generate decks
Data Science — clean, EDA, build/train/tune ML models, explain them with SHAP/LIME
AI Engineering — production RAG, prompt engineering, LoRA fine-tuning, LLM evaluation, agents

Powered by a local backend (a small FastAPI router you run on your machine) that talks directly to the model provider of your choice — bring your own keys for OpenRouter, OpenAI, Groq or Gemini, or run fully local with Ollama / LM Studio. No Colab, no tunnel, no third-party server in the middle.

The assistant responds in whatever language you write in — English, Portuguese, Italian, Spanish, French, or any other. Just type naturally.

Architecture

   ┌──────────────────────┐          ┌──────────────────────┐        ┌──────────────────┐
   │ VS Code (you)        │  HTTP/   │ Local backend        │  HTTPS │ Provider          │
   │  ┌────────────────┐  │  SSE     │ (FastAPI router on   │ ─────▶ │ OpenRouter/OpenAI │
   │  │ AI Code chat   │◄─┼──────────┼─ 127.0.0.1:8001)     │        │ Groq/Gemini  …or… │
   │  └────────────────┘  │          │  role-based routing  │ ─────▶ │ Ollama / LM Studio│
   │  - 14 slash skills   │          │  per-task model cfg  │        │ (local, no key)   │
   │  - Tool calling      │          └──────────────────────┘        └──────────────────┘
   │  - Voice + vision    │
   └──────────────────────┘

Each task type (planner / coder / vision / general / agentic) routes to whatever model you assigned it. Tools execute only on your machine — the backend just emits intents.

What you can do — the four areas

🧑‍💻 General coding (works everywhere)

Open Chat: dedicated sidebar chat with an AI Code Assistance icon in the Activity Bar (same UX as Copilot Chat / Claude Code). Click and use, no setup ritual.
Inline commands:
- AI Code Assistance: Explain Selection (Ctrl+Alt+E)
- AI Code Assistance: Refactor Selection (Ctrl+Alt+R)
- AI Code Assistance: Generate at Cursor
Tool calling: the model can read/write files, run bash, grep, glob, fetch the web, etc. Every tool call shows a confirmation modal with Accept / Always / Deny. Dangerous bash patterns (rm -rf /, dd, fork bombs) block the "Always" option as a safety net.
MCP support: connect any MCP server via .mcp.json in your workspace root — both stdio and HTTP transports supported.
Voice input and image attachments for vision-capable models.
Persistent chat history across sessions.
Multilingual: responds in the language you write in — no configuration needed.

📊 Data analytics — `/explore`, `/question`, `/chart`, `/drill`, `/report`

Slash	What it does
`/explore`	Inventory a dataset, profile every column, surface quality blockers (nulls, duplicates, outliers), recommend feasible analyses
`/question`	Turn a vague business problem into 5-10 testable analytical questions ranked by Impact × Feasibility, with hypotheses for the top 3
`/chart`	Generate Storytelling-with-Data styled charts — action titles, max 2 colors + gray, direct labels, no clutter
`/drill`	Root cause investigation with "peel the onion" methodology: confirm → decompose → isolate → repeat — until you find the specific actionable cause
`/report`	Consolidate the full analysis into a business-ready `final_report.md` + `executable_analysis.ipynb`

🔬 Data Science (ML) — `/engineer`, `/eda`, `/model`, `/explain`

Slash	What it does
`/engineer`	Data cleaning, missing values, outlier detection (IQR / Z-score), duplicates, feature engineering driven by business value
`/eda`	Exploratory analysis — descriptive stats, distributions, correlations (Pearson/Spearman), pattern discovery, hypothesis formulation
`/model`	Model selection (linear / trees / boosting / NN), train/val/test splits, hyperparameter tuning, evaluation metrics matched to the business goal
`/explain`	Multi-method feature importance: built-in + permutation + SHAP + LIME + Partial Dependence + interactions + stability checks

🤖 AI Engineering — `/rag`, `/prompt`, `/finetune`, `/eval`, `/agent`

Slash	What it does
`/rag`	Production RAG pipelines: chunking strategies, embedding selection, hybrid retrieval (semantic + BM25), reranking, prompt template, evaluation with Ragas
`/prompt`	Production prompt engineering — Chain-of-Thought, few-shot, structured outputs (JSON/XML), function calling schemas, anti-jailbreak, robustness testing
`/finetune`	LoRA / QLoRA fine-tuning end-to-end: dataset prep, Unsloth/transformers setup, hyperparams, eval against base, deployment as merged model or adapter, GGUF quantization
`/eval`	LLM evaluation systems: reference-based + rubric-based + reference-free, LLM-as-judge done right, Ragas / DeepEval / PromptFoo / LangSmith, CI integration
`/agent`	Agent architecture — tool catalog design, ReAct / plan-and-execute / reflexion, memory tiers, error handling, framework picking (LangGraph / LlamaIndex / AutoGen / CrewAI / Smolagents)

Type /skills or /help in the chat to list all 14 with descriptions.

Choose which model runs each task

AI Code Assistance routes every request to one of five roles — planner, coder, vision, general, agentic — and you decide which model each role uses. Three interchangeable ways, all sharing one config file and hot-reloaded with no restart:

⚙️ Configure Models panel (AI Code Assistance: Configure Models) — a panel with a model dropdown per task, a token field per provider (saved to your local .env, never shared), and base-URL fields for Ollama / LM Studio.
Chat commands — /models to see assignments, /model coder openrouter qwen/qwen-2.5-coder-32b-instruct to pin one, /model now <provider> <model> to force a model for the current chat.
Edit ai-code-assistance.models.json by hand. It holds only model names (no secrets), so you can commit and share it; teammates add their own keys.

Quick start

Install — search AI Code Assistance in the Extensions panel (Ctrl+Shift+X).

Run the backend locally (from the project repo):

cp .env.example .env          # set AI_TOKEN + at least one provider key
uvicorn backend.main:app --host 127.0.0.1 --port 8001

Configure VS Code — open Settings (Ctrl+,) and search AI Code Assistance:
- aiCodeAssistance.backendUrl → http://127.0.0.1:8001
- aiCodeAssistance.token → the AI_TOKEN from your .env
Click the AI Code Assistance icon in the Activity Bar (left rail) — the chat opens in the sidebar.
Type /skills to see everything you can do, and click the ⚙️ to pick your models.

Settings

Setting	Default	Purpose
`aiCodeAssistance.backendUrl`	`""`	Backend URL — e.g. `http://127.0.0.1:8001`
`aiCodeAssistance.token`	`""`	Bearer token (`AI_TOKEN`), sent on every request (machine-scoped, not synced)
`aiCodeAssistance.toolsEnabled`	`true`	Allow the model to call tools (file I/O, bash, web, MCP)
`aiCodeAssistance.permissionMode`	`ask`	Tool-call approval mode: `ask` / `auto-edit` / `auto` / `plan`
`aiCodeAssistance.healthPollSeconds`	`10`	How often to poll `/health` for the status bar indicator

MCP Servers

Create a .mcp.json file at your workspace root to connect external MCP servers:

{
  "mcpServers": {
    "my-server": {
      "command": "npx",
      "args": ["-y", "@my-org/mcp-server"],
      "env": { "API_KEY": "your-key-here" }
    },
    "http-server": {
      "url": "http://localhost:3000/mcp",
      "headers": { "Authorization": "Bearer TOKEN" }
    }
  }
}

The ⚙️ Configure Models panel shows active MCP servers and has an "Edit .mcp.json" button.

Keybindings

Action	Shortcut
Focus AI Code Assistance chat (sidebar)	`Ctrl+Shift+M` (`Cmd+Shift+M`)
Explain selection	`Ctrl+Alt+E`
Refactor selection	`Ctrl+Alt+R`

Privacy & security

Tools require explicit per-session approval. "Always" decisions live in RAM and reset when VS Code closes.
Token is machine-scoped — does not sync across devices via Settings Sync.
The backend runs on your own machine and calls your own provider account with your own keys. No third-party server in the middle. Provider keys live in your local .env (gitignored) — never in the shareable model config.
File system access and shell execution happen only on your machine — the backend just forwards the model's intent.

Why this stack?

Your models, your bill. Mix cloud (OpenRouter / OpenAI / Groq / Gemini) and local (Ollama / LM Studio) per task.
Right model per task. A cheap fast model for general chat, a strong coder for code, a vision model for screenshots — configured independently.
Shareable, secret-free config. The model config file carries model choices; teammates plug in their own keys.
Editor stays local. All file I/O and shell execution happen on your machine, behind a permission prompt.
Multilingual. The assistant responds in the language you write in, automatically.

Limitations

You run the backend yourself (a single uvicorn command) and supply at least one provider key, or a local engine (Ollama / LM Studio). The extension does not include hosted inference.
AI Engineering skills generate code; they don't run training/RAG/eval pipelines for you. They tell you what to write and why.

Issues & contributing

Bug reports, feature requests, and PRs are welcome at github.com/mendesalex89/AI_Assistant.

License

MIT — see LICENSE.

AI Code Assistance

Alex Mendes