Agent System
AI coding assistant for VS Code with multi-agent orchestration, persistent memory, and project awareness — use Copilot, Groq, or both (Groq powers Copilot). Maintains a live, code- and logic-aware state of the project.

What makes Agent System different
- Local persistent memory — Session, project, and global memory in SQLite; no cloud sync required.
- Multi-agent orchestration — Coder, Reviewer, Tester, and Coordinator with intent-based routing.
- API-agnostic — Copilot, Groq, and future providers; switch or combine from the UI.
- Works without API keys — DuckDuckGo for web search and Copilot when available; add Groq to power Copilot when you want.
Privacy
- No telemetry.
- Runs locally.
- Your code never leaves your machine except through your configured model provider (e.g. Copilot or Groq).
Features
Sidebar chat — Streaming chat with tool use (read/edit files, run shell, web search, run tests). Tool use is supported for both Copilot and Groq. Multiple conversations saved per workspace; switch, rename, or delete from the chat view.
Configurable streaming timeouts — Three timeout tiers for chat streaming: first chunk (default 120 s), next chunks (default 90 s), and a global hard limit (default 180 s). Configurable via agentSystem.chat.streamingFirstChunkTimeoutMs, streamingNextChunkTimeoutMs, and streamingGlobalTimeoutMs. Clear error messages when a tier is exceeded so you can adjust timeouts or retry.
Mode selector — Three modes in the chat header: Auto (intent-based routing), Agent (executes code changes), and Plan (analyzes and proposes without modifying files). Plan mode blocks all write and terminal tools automatically.
Reasoning level — Low / Medium / High pill selector controls thinking depth per message. Passed to the underlying model (e.g. thinkingBudget on supported providers); ignored silently on models that don't support it.
Specialized agents — Coder, Reviewer, Tester, and Coordinator agents. The extension routes by intent in Auto mode; Agent mode uses the specialist pipeline; Plan mode always routes through the Coordinator with a read-only constraint.
AI Project Awareness — On workspace open, the extension performs static analysis (no LLM call) to detect framework (React, Next.js, NestJS, Express, …), language (TypeScript, Python, Go, …), build tool, test runner, and architecture pattern (clean architecture, MVC, monorepo). The result is injected into the system prompt as "Project Context" and shown as a status badge (Healthy / Needs Review / Unknown) in the sidebar.
Project Memory panel — Manual controls in the sidebar: Index Project, Reindex, Clear Index, and Clear local data (clears index and local SQLite data; overflow in the panel is handled so all controls stay usable). Shows current index status (files indexed, embeddings enabled, last indexed). Displays a ⚠ Index outdated warning automatically when the workspace folders change. A Search badge in the same section shows the active web search provider (e.g. DuckDuckGo | Configure).
Persistent workspace index — Indexed file chunks (embeddings + text) are stored in SQLite and reloaded on next launch without re-indexing. Unchanged files are skipped via SHA-256 content hashing (no API calls, no CPU work for unmodified files).
Persistent memory — Session, project, and global memory stored locally in SQLite, per workspace. Automatic TTL expiry and LRU eviction keep each scope within its capacity limit (session: 500, project: 2 000, global: 10 000 entries). No cloud sync; no API key required for memory.
Project context — Loads AGENT_SYSTEM.md from workspace root(s) into the system prompt. Supports multi-root workspaces. Refresh with the ↻ button or Set Project Instructions from the Command Palette.
Workspace RAG (optional) — Uses OpenAI text-embedding-3-small (512 dimensions) when an API key is set. Configure from the sidebar (Project Memory → Set API Key) or the Command Palette (Agent System: Set OpenAI API Key). Relevant file chunks are retrieved semantically and injected alongside memory context. If no key is configured, the extension falls back to FTS3 full-text search. The index persists across restarts; only changed files are re-embedded on subsequent runs.
Tool approval — Optional confirmation for sensitive actions (file edits, creation/deletion, shell, web) before execution.
Settings quick access — ⚙ gear button in the chat header opens the extension’s settings (e.g. timeouts, model, Groq, web search) without leaving the editor.
Model selector — Persistent model selection saved across sessions. Pick from available VS Code language models or use the auto-detected default.
Groq support (optional) — Use Groq as an alternative or complement to Copilot: models llama-3.3-70b-versatile, llama-3.1-8b-instant, or gemma2-9b-it with tool use (streaming and tool calls). Configure via Agent System: Set Groq API Key from the Command Palette or the Groq section in the sidebar. Tool use works with both Copilot and Groq.
Copilot Chat — Use Agent System from Copilot Chat with @agent (requires the Copilot Chat extension).
Web Search — DuckDuckGo works out of the box with no API key. Configure a premium provider (Brave, Serper.dev, Tavily) or a custom self-hosted endpoint from the sidebar badge (Search → Configure) or via Agent System: Configure Web Search Provider & API Key from the Command Palette. Hot-reload: provider changes apply immediately without restarting VS Code.
Requirements
- VS Code 1.99 or later
- A language model provider (e.g. GitHub Copilot) that supports the VS Code Language Model API
Configuration
| Setting |
Description |
agentSystem.enabled |
Turn the extension on or off |
agentSystem.logLevel |
Log level: debug, info, warn, or error |
agentSystem.memory.* |
Memory limits per scope (session / project / global) |
agentSystem.model.selection |
Preferred model ID (persisted across sessions) |
agentSystem.reasoning.effort |
Default reasoning level: low, medium, or high |
agentSystem.agent.maxSteps |
Max agent loop steps per turn (3–30) |
agentSystem.agent.llmTimeoutMs |
LLM request timeout (ms) |
agentSystem.agent.synthesisTimeoutMs |
Final synthesis request timeout (ms) |
agentSystem.chat.streamingFirstChunkTimeoutMs |
Timeout for first streaming chunk (ms). Default 120 000. Increase on slow networks. |
agentSystem.chat.streamingNextChunkTimeoutMs |
Timeout between subsequent streaming chunks (ms). Default 90 000. |
agentSystem.chat.streamingGlobalTimeoutMs |
Hard limit for entire streaming response (ms). Default 180 000. |
agentSystem.index.maxChunks |
Max chunks in workspace index |
agentSystem.index.maxChunksPerRoot |
Max chunks per workspace root |
agentSystem.index.concurrency |
Indexing concurrency |
agentSystem.index.minSimilarityThreshold |
Min similarity for RAG retrieval (0–1) |
Command Palette: Agent System: Show Index Stats shows current index status, file and chunk counts, estimated RAM, and per-root chunk breakdown.
Optional features (Secrets):
| Action |
How |
| Configure embeddings |
Agent System: Set OpenAI API Key from the Command Palette, or Set API Key in the sidebar (Project Memory section) |
| Configure Groq API key |
Agent System: Set Groq API Key from the Command Palette, or Groq section in the sidebar |
| Configure web search |
Agent System: Configure Web Search Provider & API Key from the Command Palette, or Configure next to the Search badge in the sidebar |
Workspace embeddings require an OpenAI API key (configure via sidebar → Project Memory → Set API Key or the Command Palette). Web search works without any key (DuckDuckGo fallback); premium providers (Brave, Serper, Tavily, custom) can be configured from the sidebar search badge.
Development
From the extension folder (apps/vscode-extension/):
pnpm install
pnpm run compile
Press F5 in VS Code to open the Extension Development Host. Use Run Extension from the Run and Debug panel as an alternative.
- Watch build:
pnpm run watch
- Validation:
pnpm run validate
Built by Jorge Leal
| |