AI Code Assistance — General Coding + Data + Data Science + AI EngineeringA Vibe Coding assistant for VS Code that goes far beyond simple chat. AI Code Assistance turns your editor into a specialist across four full areas:
Powered by a local backend (a small FastAPI router you run on your machine) that talks directly to the model provider of your choice — bring your own keys for OpenRouter, OpenAI, Groq or Gemini, or run fully local with Ollama / LM Studio. No Colab, no tunnel, no third-party server in the middle. The assistant responds in whatever language you write in — English, Portuguese, Italian, Spanish, French, or any other. Just type naturally. Architecture
Each task type (planner / coder / vision / general / agentic) routes to whatever model you assigned it. Tools execute only on your machine — the backend just emits intents. What you can do — the four areas🧑💻 General coding (works everywhere)
📊 Data analytics —
|
| Slash | What it does |
|---|---|
/explore |
Inventory a dataset, profile every column, surface quality blockers (nulls, duplicates, outliers), recommend feasible analyses |
/question |
Turn a vague business problem into 5-10 testable analytical questions ranked by Impact × Feasibility, with hypotheses for the top 3 |
/chart |
Generate Storytelling-with-Data styled charts — action titles, max 2 colors + gray, direct labels, no clutter |
/drill |
Root cause investigation with "peel the onion" methodology: confirm → decompose → isolate → repeat — until you find the specific actionable cause |
/report |
Consolidate the full analysis into a business-ready final_report.md + executable_analysis.ipynb |
🔬 Data Science (ML) — /engineer, /eda, /model, /explain
| Slash | What it does |
|---|---|
/engineer |
Data cleaning, missing values, outlier detection (IQR / Z-score), duplicates, feature engineering driven by business value |
/eda |
Exploratory analysis — descriptive stats, distributions, correlations (Pearson/Spearman), pattern discovery, hypothesis formulation |
/model |
Model selection (linear / trees / boosting / NN), train/val/test splits, hyperparameter tuning, evaluation metrics matched to the business goal |
/explain |
Multi-method feature importance: built-in + permutation + SHAP + LIME + Partial Dependence + interactions + stability checks |
🤖 AI Engineering — /rag, /prompt, /finetune, /eval, /agent
| Slash | What it does |
|---|---|
/rag |
Production RAG pipelines: chunking strategies, embedding selection, hybrid retrieval (semantic + BM25), reranking, prompt template, evaluation with Ragas |
/prompt |
Production prompt engineering — Chain-of-Thought, few-shot, structured outputs (JSON/XML), function calling schemas, anti-jailbreak, robustness testing |
/finetune |
LoRA / QLoRA fine-tuning end-to-end: dataset prep, Unsloth/transformers setup, hyperparams, eval against base, deployment as merged model or adapter, GGUF quantization |
/eval |
LLM evaluation systems: reference-based + rubric-based + reference-free, LLM-as-judge done right, Ragas / DeepEval / PromptFoo / LangSmith, CI integration |
/agent |
Agent architecture — tool catalog design, ReAct / plan-and-execute / reflexion, memory tiers, error handling, framework picking (LangGraph / LlamaIndex / AutoGen / CrewAI / Smolagents) |
Type /skills or /help in the chat to list all 14 with descriptions.
Choose which model runs each task
AI Code Assistance routes every request to one of five roles — planner, coder, vision, general, agentic — and you decide which model each role uses. Three interchangeable ways, all sharing one config file and hot-reloaded with no restart:
- ⚙️ Configure Models panel (
AI Code Assistance: Configure Models) — a panel with a model dropdown per task, a token field per provider (saved to your local.env, never shared), and base-URL fields for Ollama / LM Studio. - Chat commands —
/modelsto see assignments,/model coder openrouter qwen/qwen-2.5-coder-32b-instructto pin one,/model now <provider> <model>to force a model for the current chat. - Edit
ai-code-assistance.models.jsonby hand. It holds only model names (no secrets), so you can commit and share it; teammates add their own keys.
Quick start
Install — search AI Code Assistance in the Extensions panel (
Ctrl+Shift+X).Run the backend locally (from the project repo):
cp .env.example .env # set AI_TOKEN + at least one provider key uvicorn backend.main:app --host 127.0.0.1 --port 8001Configure VS Code — open Settings (
Ctrl+,) and search AI Code Assistance:aiCodeAssistance.backendUrl→http://127.0.0.1:8001aiCodeAssistance.token→ theAI_TOKENfrom your.env
Click the AI Code Assistance icon in the Activity Bar (left rail) — the chat opens in the sidebar.
Type
/skillsto see everything you can do, and click the ⚙️ to pick your models.
Settings
| Setting | Default | Purpose |
|---|---|---|
aiCodeAssistance.backendUrl |
"" |
Backend URL — e.g. http://127.0.0.1:8001 |
aiCodeAssistance.token |
"" |
Bearer token (AI_TOKEN), sent on every request (machine-scoped, not synced) |
aiCodeAssistance.toolsEnabled |
true |
Allow the model to call tools (file I/O, bash, web, MCP) |
aiCodeAssistance.permissionMode |
ask |
Tool-call approval mode: ask / auto-edit / auto / plan |
aiCodeAssistance.healthPollSeconds |
10 |
How often to poll /health for the status bar indicator |
MCP Servers
Create a .mcp.json file at your workspace root to connect external MCP servers:
{
"mcpServers": {
"my-server": {
"command": "npx",
"args": ["-y", "@my-org/mcp-server"],
"env": { "API_KEY": "your-key-here" }
},
"http-server": {
"url": "http://localhost:3000/mcp",
"headers": { "Authorization": "Bearer TOKEN" }
}
}
}
The ⚙️ Configure Models panel shows active MCP servers and has an "Edit .mcp.json" button.
Keybindings
| Action | Shortcut |
|---|---|
| Focus AI Code Assistance chat (sidebar) | Ctrl+Shift+M (Cmd+Shift+M) |
| Explain selection | Ctrl+Alt+E |
| Refactor selection | Ctrl+Alt+R |
Privacy & security
- Tools require explicit per-session approval. "Always" decisions live in RAM and reset when VS Code closes.
- Token is machine-scoped — does not sync across devices via Settings Sync.
- The backend runs on your own machine and calls your own provider account with your own keys. No third-party server in the middle. Provider keys live in your local
.env(gitignored) — never in the shareable model config. - File system access and shell execution happen only on your machine — the backend just forwards the model's intent.
Why this stack?
- Your models, your bill. Mix cloud (OpenRouter / OpenAI / Groq / Gemini) and local (Ollama / LM Studio) per task.
- Right model per task. A cheap fast model for general chat, a strong coder for code, a vision model for screenshots — configured independently.
- Shareable, secret-free config. The model config file carries model choices; teammates plug in their own keys.
- Editor stays local. All file I/O and shell execution happen on your machine, behind a permission prompt.
- Multilingual. The assistant responds in the language you write in, automatically.
Limitations
- You run the backend yourself (a single
uvicorncommand) and supply at least one provider key, or a local engine (Ollama / LM Studio). The extension does not include hosted inference. - AI Engineering skills generate code; they don't run training/RAG/eval pipelines for you. They tell you what to write and why.
Issues & contributing
Bug reports, feature requests, and PRs are welcome at github.com/mendesalex89/AI_Assistant.
License
MIT — see LICENSE.