NIM Code

AI coding assistant for VS Code — NVIDIA NIM, OpenAI, Anthropic Claude, Google Gemini, and Groq in one panel.

Chat, explain, refactor, and fix code with state-of-the-art models. Switch between Nemotron, GPT-5, Claude Opus, Gemini 2.5, Groq-hosted GPT-OSS, and more without leaving your editor — each provider uses its own API key, stored securely in your OS keychain.

No API key? Start instantly with Auto mode — a free agent (tools + file edits) with automatic multi-provider failover, 5 requests per day, no sign-up required.

In one line

An agentic coding assistant that can run fully offline against a local model, keeps your code private to the provider you choose, and works free with no API key if you just want to try it.

Key points

🔓 Free to start — no API key, no sign-up, no credit card. Auto mode runs the complete agent loop (all 32 tools, file edits, terminal) through a built-in gateway that fails over across providers automatically. 5 requests per day, resetting at midnight UTC. Install and be working in under a minute.

🔌 Offline coding agent — your machine, your model. Point nimcode.baseUrl at any OpenAI-compatible server and everything works identically: Ollama, LM Studio, llama.cpp, vLLM, or a self-hosted NIM deployment. Switch with one click (NIM Code: Switch LLM Endpoint) — presets are built in. Local endpoints need no API key at all: loopback addresses (localhost, 127.0.0.1, ::1, host.docker.internal, *.local) are detected automatically and sent a placeholder token, since local servers accept any bearer. A Local badge in the chat bar shows which backend a conversation is hitting, so it's never ambiguous.

🔒 Private coding agent — nothing goes anywhere you didn't pick. Your code, prompts, and responses go to one place only: the provider of the model you selected. With a local endpoint, that means your code never leaves your machine. The webview UI runs under a strict Content Security Policy and makes zero network requests of its own, and all agent file access is scoped to your workspace root.

🔑 Keys live in your OS keychain — never on disk. Each provider's key goes into VS Code SecretStorage (Keychain / Credential Manager / libsecret). Never written to settings.json, never committed, never left in a dotfile.

🧠 Bring any model — five providers, or your own. NVIDIA NIM (default), OpenAI, Anthropic Claude, Google Gemini, and Groq — each with its own key, and only the selected model's key is ever required. Add your own model ids from the UI with no extension update. Reasoning-effort control (Low / Medium / High) on models that support it.

⚡ A real agent, not a chat box. 32 built-in tools: read/write/edit files, ripgrep + natural-language semantic search, terminal with background shells, git status/diff/log, diagnostics, code review, and scoped sub-agent delegation. Up to 6 independent read-only calls run concurrently; anything that mutates runs strictly one at a time, in order.

🛡️ You approve what it changes. Four approval modes. Every write shows a real VS Code diff before it lands. delete_file and run_terminal always ask. Plan mode blocks every mutating tool outright — the agent investigates read-only, then hands you a plan to approve before a single byte changes.

🔁 Runs in the background, and queues up work. Agents keep working while you keep coding — a status-bar counter tracks live runs and you get a toast on completion. Stack up a task queue to run jobs sequentially, each honoring the current mode, model, and approval settings.

🧩 Extend it without forking it. MCP servers (stdio) merge their tools straight into the agent's roster. Lifecycle hooks (PreToolUse / PostToolUse / Stop) run your own shell commands around every tool call — and can block one. Skills are plain Markdown prompt packages you drop in .nimcode/skills/ and commit with the repo.

📌 It remembers — across turns, sessions, and teammates. Three distinct layers: a read-only CLAUDE.md you author, a git-committed .nimcode/memory.md your whole team shares, and machine-global personal memory. Long conversations auto-compact instead of falling over.

💰 Costs are visible, not a surprise. Per-message token usage and dollar estimate, a live context-budget bar, and prompt caching to cut the cost of repeated large-context calls.

📊 Telemetry is one anonymous ping per day of use — opt out with one setting. It contains only VS Code's anonymized machine id, the extension version, the VS Code version, and your OS. Never your code, prompts, responses, file paths, project names, model ids, or keys. Disable with nimcode.telemetry.enabled: false.

How it works

Pick a mode — Auto (free, no key), Chat (ask questions), or Agent (let it act on your workspace).
Pick a backend — a cloud provider with your own key, or a local model via the endpoint switcher. The same 32 tools work either way.
Ask for something — your editor selection and the active file are attached automatically. Once you've built the semantic index (one command), relevant code is retrieved for each request too, so the agent starts oriented instead of exploring from scratch.
The agent loops — it calls tools, reads results, and calls more tools (up to 30 steps), streaming its progress as a live checklist you can watch.
You approve the changes — diffs for writes, explicit confirmation for anything destructive, and a Stop button that actually stops it.

Everything runs in the extension host on your machine. There is no NIM Code server in the path — except the optional free Auto-mode gateway, which you can bypass entirely by using your own key or a local model.

Features

Auto mode — free agent tier, no key needed

Start immediately without an API key. Auto mode runs the full agent loop — the same tools, file edits, and terminal access as Agent mode — through a built-in OpenAI-compatible routing gateway. The gateway key's server-side model routing fails over across providers automatically (e.g. Groq → NVIDIA → Google), so the free tier stays available even when one backend is busy or down.

Detail	Value
Backend	Routing gateway with automatic multi-provider failover
Daily limit	5 free requests (resets at midnight UTC)
API key required	No
Capabilities	Full agent — the same 32 tools (+ MCP) as Agent mode
Approvals	Edits auto-approve; destructive tools (`delete_file`, `run_terminal`) still ask
Counter	Shown live in the chat bar: `X/5 free today`
When exhausted	The composer is disabled and further Auto sends are blocked until the quota resets at midnight UTC — switch to Chat/Agent with your own key to keep going

Auto always uses the free gateway tier — regardless of any API keys you've set — which is what keeps it "free, no key needed", and every Auto request counts against the 5/day quota. Once the quota is used up, Auto sends are blocked until midnight UTC. The model picker is ignored in Auto mode. To run your own model/key with no daily cap, use Chat or Agent mode and pick it from the model dropdown.

Note: Auto mode is a Beta feature.

Multi-provider, multi-model chat

Pick any model from the dropdown, grouped by provider: NVIDIA NIM (default), OpenAI (GPT-4 and GPT-5 families), Anthropic (Claude), Google (Gemini), and Groq (GPT-OSS, Qwen, Compound). Add your own model IDs through the Add custom model button — pick the provider, paste the id, done; no extension update required. Each provider has its own API key (NIM Code: Set API Key) and only the key for the selected model's provider is needed.

Agent mode

Let NIM Code act autonomously on your workspace. It reads, searches, writes, and refactors files; runs terminal commands; queries git; runs code reviews; tracks its own progress; and opens files directly in your editor.

32 built-in agent tools:

Category	Tool	What it does
Files	`read_file`	Read any file in your workspace
Files	`list_files`	List files and folders in a directory
Files	`write_file`	Create or overwrite a file with generated content
Files	`edit_file`	Make a targeted edit by replacing a snippet of a file's content (exact match first, whitespace-tolerant fallback)
Files	`multi_edit`	Apply several find/replace edits to one file atomically — validated together, all-or-nothing
Files	`delete_file`	Delete a file from the workspace
Files	`rename_file`	Rename or move a file within the workspace
Search	`search_codebase`	Ripgrep-powered search — returns file paths, line numbers, and matching content
Search	`find_files`	Recursively find files by glob pattern (e.g. `*/.test.ts`), respecting default excludes
Search	`search_codebase_semantic`	Natural-language search over a local embedding index of your workspace
Web	`web_fetch`	Fetch an http(s) URL and return its readable text — optionally answering a prompt against the page
Web	`web_search`	Search the web (keyless DuckDuckGo) and return title/URL/snippet results
Terminal	`run_terminal`	Execute any shell command (configurable timeout via `nimcode.terminalTimeoutMs`, default 10 min, 0 = unbounded; posts a "still running" heartbeat every 15s). Set `background: true` for async/long-running commands
Terminal	`list_background_shells`	List background shells started via `run_terminal` with their status and runtime
Terminal	`get_background_output`	Read new output from a background shell and its current status
Terminal	`kill_background_shell`	Terminate a running background shell by id
Git	`git_status`	Show modified, staged, and untracked files
Git	`git_diff`	Show uncommitted changes, optionally scoped to one file or staged only
Git	`git_log`	Show recent commit history
Review	`review_code`	Run a focused review (general or security) over a file or the current diff
Editor	`get_diagnostics`	Read VS Code errors and warnings from the Problems panel
Editor	`open_file_in_editor`	Open a file and jump to a specific line number
Planning	`todo_write`	Maintain a structured todo list, shown to you as a live checklist while the agent works
Planning	`present_plan`	Publish a structured plan (title + steps) for you to approve before the agent executes — see Planner Mode below
Memory	`remember`	Persist a fact, preference, or convention to cross-session/cross-workspace memory (or clear it with `forget_all`) — see Memory below
Memory	`project_memory`	Record team-shared facts about the current workspace (architecture, conventions, decisions, TODOs) into a committed `.nimcode/memory.md` — see Memory below
Nim	`nim_format`	Format a `.nim` file in place with `nimpretty`, the official Nim style beautifier, via the same diff-preview approval flow as `write_file`/`edit_file`
Nim	`nim_doc`	Generate HTML API docs for a `.nim` file via the `nim doc` compiler command, optionally opening the result in your browser
Nim	`nim_expand`	Reveal what a Nim routine actually compiles to — runs the compiler's `--expandArc`/`--expandOrc` to show the code after template inlining, destructor injection, and control-flow lowering (read-only)
Nim	`nim_run`	Compile and execute a `.nim` file with `nim c -r`, returning compiler + program output (optional `-d:release` and program args) — requires confirmation, honors the terminal timeout, and is cancellable
Nim	`nim_test`	Run Nim tests and report pass/fail: `testament` pattern/category/all from the workspace root, or a single-file `nim c -r` fallback (`mode: file`); leads with an `N passed, M failed` summary and lists failing tests
Delegation	`delegate_task`	Spawn a scoped sub-agent (research-only or code) that runs its own tool loop on a self-contained task and returns a synthesized result

Smarter tool use

Three behaviors that make Agent mode work the way Claude Code / Cursor-class agents do:

Parallel read batching — the agent can issue up to 6 independent read-only tool calls in one step (reading several files, multiple searches, diagnostics) and they execute concurrently. Anything that modifies files or runs commands still executes strictly one at a time, in order.
Self-correcting edits — after every successful file write/edit, the file's fresh errors and warnings are collected from language servers and fed straight back to the model, so it fixes its own mistakes on the next step without being asked. Toggle with nimcode.autoDiagnostics. Edits are also resilient: edit_file/multi_edit fall back to whitespace-tolerant matching when a snippet's indentation drifted from the real file (applied only when the match is unambiguous, and disclosed in the result).
Auto-retrieved context — when the semantic index is built (NIM Code: Index Codebase), each agent request automatically pulls the most relevant code snippets into context, so the agent starts oriented instead of exploring from scratch. Toggle with nimcode.autoContext.

Sub-agent delegation

For large or exploratory tasks, the agent can spin off a scoped sub-agent via the delegate_task tool. The sub-agent starts from a clean context (it doesn't inherit the whole conversation), runs its own tool loop on a self-contained brief, and returns just a synthesized summary — so investigation doesn't clutter the main thread. Pick research (read-only: reading, searching, git, web) to gather information, or code (adds file edits + terminal) for an isolated change. Sub-agents inherit your approval mode (so their edits are still gated) and can't spawn further sub-agents.

Right-click code actions

Select any code, then right-click (or press Ctrl+. for the Quick Fix lightbulb) to jump straight into Explain, Fix, Review, Refactor, Generate Tests, or Document — each opens the chat panel with the matching slash command pre-filled using your selection.

MCP (Model Context Protocol) support

Connect your own local MCP servers (stdio transport) via nimcode.mcpServers in Settings. Each server's tools are merged into Agent mode's tool roster, namespaced as mcp__<server>__<tool>, so the agent can call out to your own tools alongside the 32 built-ins. See Configuration.

Lifecycle hooks

Run your own shell commands around Agent-mode events via nimcode.hooks in Settings — a machine-scoped map keyed by event name:

PreToolUse — runs before a matching tool call; exit code 2 blocks the tool and sends the hook's stderr back to the model (e.g. deny edits to protected paths).
PostToolUse — runs after a tool completes; exit code 2 appends its stderr to the tool result as feedback (e.g. auto-format or lint an edited file and report problems back).
Stop — runs when the agent is about to hand back a final answer; exit code 2 keeps it going, feeding stderr back as the next instruction (e.g. "don't stop until the build passes"). Bounded so a hook can't loop forever.

Each event holds matcher groups: matcher is a JS regex tested against the tool name (empty matches every tool; ignored for Stop), and hooks lists the commands. Every command receives the event as JSON on stdin plus NIMCODE_HOOK_EVENT, NIMCODE_TOOL_NAME, and NIMCODE_PROJECT_DIR environment variables. Hooks apply to sub-agent tool calls too. Because entries execute arbitrary commands, the setting is machine-scoped (not settable from a workspace's .vscode/settings.json). See Configuration.

Tool-call approval & Plan mode

Choose how much confirmation Agent mode needs before acting — Manual, Edit automatically, Auto, or Plan (read-only investigation with no file/shell access) — via the approval-mode dropdown next to the model picker. File writes show a VS Code diff editor to review before applying.

Planner Mode (plan → approve → execute)

In Plan mode the agent investigates read-only, then publishes a structured plan (a titled, numbered list of steps) as a review card instead of diving straight into edits — the understand → analyze → plan → show → execute flow. Review the steps and click Approve & execute to let the agent carry them out (it switches to Edit-automatically and proceeds), or Dismiss to refine your request and have it re-plan. The plan is produced by the agent's present_plan tool, so it appears the moment the agent is ready — no waiting for a wall of prose.

Background agents

Dispatch an agent task and keep working — it runs asynchronously while you switch chat sessions, start other tasks, or code in the editor. Multiple runs can be in flight at once, each tied to its own session.

Start one: just send an Agent-mode message, then navigate away or start another. There's no separate "background" button — concurrency is the default.
Run it unattended: pick an auto-running approval mode (Auto or Edit automatically) so the run doesn't pause for confirmation while you're away. Destructive tools (delete_file, run_terminal) still ask — if a background run hits one, it pauses and is surfaced (see below) until you open its session and answer.
See progress at a glance: a status-bar item shows how many agents are running (spinning sync icon) or need attention (bell icon); each conversation row shows a per-session badge (running / needs approval / failed); and a notification pops when a run finishes or fails, with an Open button that jumps to that session.
Cancel the run in the current session with the Stop button; other runs keep going.
Scope: runs live in memory — they survive switching sessions and hiding the panel, but a full VS Code/window reload ends them (marked interrupted, with any partial output preserved). They are not resumed across a restart.

Task queue (run tasks sequentially)

Background agents run concurrently; the Task queue is the opposite tool — line up several tasks and run them one at a time, each starting only once the previous one finishes. Useful when tasks build on each other (refactor → update call sites → update tests) and running them at once would have them fighting over the same files.

Open it: the Task queue button in the chat header (it shows a badge with the number of queued tasks).
Build the queue: type a task into the Add a task… row and press Enter. Reorder pending tasks with the up/down buttons, or remove them with the trash button.
Run it: click Run N tasks. Each task is dispatched as a normal turn in the current mode and model — so pick Agent mode (with Auto or Edit automatically) if the tasks should edit files unattended. Each task's output lands in the transcript exactly like a typed message.
Watch it: every task shows its state with an icon — waiting, running, done, or failed.
Stop it: Stop queue halts after the running task (it isn't cancelled). The composer's Stop button cancels the running task and stops the queue.
On failure: a failed task is marked as failed and the queue continues with the next one — one bad task doesn't abandon the batch.
Scope: like background runs, the queue is in-memory — it doesn't survive a window reload.

Live agent task list

For multi-step tasks, the agent maintains a visible todo checklist (via the todo_write tool) that updates in place as it works through each step, so you can follow along without reading every tool call.

Reasoning effort control

For reasoning-capable models, pick Low, Medium, or High effort from the dropdown next to the model picker to trade off response speed against depth of reasoning.

Semantic codebase search

Run NIM Code: Index Codebase for Semantic Search from the Command Palette to build a local embedding index of your workspace (via nimcode.embeddingModel). Once built, the agent's search_codebase_semantic tool can find relevant code by meaning, not just keyword — useful for "where do we handle X" style questions.

Memory

Agent mode automatically loads three sources of memory into its system prompt on every turn:

CLAUDE.md project memory — a CLAUDE.md file at your workspace root (if one exists) is read and injected automatically, so project-specific conventions, build commands, and architecture notes are always in context without you having to repeat them. Read-only from the agent's side — you author it.
Project memory (.nimcode/memory.md) — a per-workspace, git-committed memory the whole team shares, structured by category (Architecture, Coding style, Decisions, TODOs, Conversations). Unlike CLAUDE.md, the agent can write it via the project_memory tool, and you can view/edit/delete entries in the Project Memory panel (the Project memory button in the chat header, or the NIM Code: Open Project Memory command). Populate it three ways:
- the agent calls project_memory when it learns something durable about the codebase (approval-gated like other edits);
- auto-capture — when a long session is compacted, durable decisions and TODOs are distilled into it automatically (opt-in via nimcode.projectMemory.autoCapture, off by default since it writes a committed file);
- explicitly — select code and run NIM Code: Add Selection to Project Memory (also on the editor right-click menu), or type /remember-project <fact> in the chat.
Toggle the whole feature with nimcode.projectMemory.enabled. Because the file is committed, project memory is shared with everyone who clones the repo.
Persistent user memory — a separate, cross-session/cross-workspace memory the agent maintains itself via the remember tool (action: "add" to persist a fact/preference, "forget_all" to clear everything). It's stored outside any workspace (personal, machine-global), so it carries over to future sessions and other projects on this machine — as opposed to project memory, which is per-repo and team-shared.

Image attachments (vision)

Paste a screenshot directly into the chat input to attach it to your message — handy for sharing UI bugs, error dialogs, or design mockups. Requires a model with the vision capability (built-in or added via customModels); up to 4 images per message.

Auto-attached editor selection

Whatever text you have highlighted in the active editor shows up as a chip above the chat input automatically — no need to click an attach button. The chip tracks your selection live (it updates as you select something else, and clears when you deselect); click the close button on the chip to detach it for the next message. It's sent as fenced, file/line-labeled context alongside whatever you type.

AI-generated commit messages

Click the sparkle icon in the Source Control view's title bar (or run NIM Code: Generate Commit Message) to fill the commit message box with a Conventional Commits message generated from your staged (or working-tree) diff.

GitHub PR review (post comments)

Run NIM Code: Review GitHub PR (also in the Source Control view's ⋯ menu) to review a pull request end-to-end via the GitHub CLI (gh):

It detects the PR for your current branch (or asks for a PR number), then you pick a General or Security focus.
It fetches the PR diff with gh pr diff, generates a review with your selected model, and opens it in an editor.
You review the generated comment first — nothing is posted until you confirm the modal. On Post as PR comment, it publishes the review to the PR via gh pr review --comment.

Requires the GitHub CLI installed and authenticated (gh auth login). Because it posts publicly to GitHub, the post step is always gated behind an explicit confirmation.

Slash commands

Type / in the chat input to trigger context-aware prompts using your active editor selection:

Command	Alias	Description
`/explain`		Explain what the selected code does
`/fix`		Find and fix bugs in the selection
`/review`		Code-review the selection for quality and correctness
`/refactor`	`/ref`	Restructure code for readability and maintainability
`/optimize`	`/opt`	Improve performance and reduce complexity
`/test`		Generate unit tests for the selection
`/document`	`/doc`	Write JSDoc / docstring comments for the selection
`/debug`		Diagnose an error or stack trace
`/commit`		Generate a conventional commit message for the current diff
`/summary`		Summarize what a file or directory does
`/security`		Review code for security vulnerabilities
`/migrate`		Migrate code to a new version or framework
`/pr`		Generate a pull request title and description

Skills (reusable prompt packages)

Beyond the fixed built-in slash commands, you can define your own skills — reusable, shareable prompt packages stored as Markdown files. Each skill is a .md file with optional frontmatter and a prompt body:

---
name: refactor-nim
description: Idiomatic Nim refactor
version: 1.0.0
---
You are a Nim expert. Refactor the following code to be idiomatic and memory-safe,
preserving behavior. Explain each change briefly.

{{selection}}

{{input}}

Where they live: <workspace>/.nimcode/skills/*.md (git-committed, shared with your team) and a machine-global user directory (personal, cross-workspace). A workspace skill overrides a user skill of the same name.
Invoke them: type / in the chat — skills appear in the autocomplete alongside built-in commands, tagged skill. Pick one (e.g. /refactor-nim) and the skill's prompt is applied to that turn, in whatever mode (Chat/Agent) you're in.
Placeholders: {{selection}} is replaced with your attached editor selection and {{input}} with whatever you type after the trigger. A skill with no placeholders simply has your input appended.
Live reload: adding, editing, or deleting a skill file refreshes the / picker immediately (no reload needed).
Install shared skills: run NIM Code: Install Skill from URL and paste a raw .md URL to drop a teammate's or community skill into .nimcode/skills/.

Streaming responses

Responses stream token-by-token so you see output immediately, with full cancel support mid-stream.

Copy code blocks

Every code block in the chat output has a Copy button always visible in the header bar. Click it to copy the code to your clipboard in one click, or use Insert (appears on hover) to paste it directly at the cursor.

Nim macro & template expansion

Nim's macros and templates are its superpower — and its hardest-to-inspect feature. The nim_expand agent tool answers "what does this actually compile to?" by running the compiler's --expandArc (default) or --expandOrc flag on a routine, then showing the code Nim generates after template inlining, destructor/=copy/=sink injection, and control-flow lowering. It's read-only (generated C goes to a throwaway cache — your workspace is never touched), so it works even in Plan mode. Ask in Agent mode, e.g. "Use nim_expand to show what parseConfig in src/config.nim expands to." The result renders as a syntax-highlighted Nim block. Note: the compiler only expands a routine it actually instantiates, so the target symbol must be reachable (used) in the file or its imports.

Mermaid diagram rendering

Fenced ```mermaid code blocks render as inline diagrams (flowcharts, sequence diagrams, class diagrams, etc.) instead of plain code — handy for architecture or sequence explanations. A Copy button on the diagram header copies the raw diagram source. Falls back to the raw text while a diagram is still streaming in or if it fails to parse.

Token budget indicator

A context-window usage bar appears below the message list, showing how much of the model's context has been consumed by the current conversation. Color shifts from green → amber → red as you approach the limit.

Per-message token usage & cost

Each assistant turn's footer shows the tokens that turn consumed, and — for models with published metered pricing (OpenAI, Anthropic, Google, Groq) — an estimated USD cost at list prices (e.g. 1,240 tok · $0.0083). Cached prompt tokens are billed at each provider's discounted cache-read rate, so the cost reflects prompt caching. Models without published pricing (the NIM free tier, self-hosted / local endpoints, unknown custom ids) show the token count only, with no dollar figure.

Prompt caching

Repeated large-context calls — the same system prompt, project memory, and tool roster resent on every step of the Agent loop — reuse a cached prefix instead of being reprocessed from scratch, cutting cost and latency. NIM Code marks the stable prefix (system prompt + the latest turn) with a cache_control breakpoint for Anthropic (Claude) models; OpenAI caches automatically with no marker needed. When a response is served partly from cache, the message footer shows (N cached) next to its token count. Toggle with nimcode.promptCaching (default on). Providers without prompt-cache support are unaffected.

Auto-compaction of long conversations

Once a session's history exceeds the recent-turns window (20 by default), older turns are no longer silently dropped — they're folded into a rolling summary (via a small extra model call) and kept in context alongside the most recent turns verbatim. The summary is cached and only re-generated as further turns push past it, so long-running sessions keep continuity without an unbounded context cost.

Secure API key storage

Every provider's API key (NVIDIA NIM, OpenAI, Anthropic, Google, Groq) is stored in the OS keychain via VS Code SecretStorage — never in settings files or workspace storage.

Persistent sessions

All conversations are saved locally and accessible from the session list (the Conversations button). Rename, delete, or switch between sessions at any time.

Feedback & suggestions

The Feedback button in the chat header opens a short form for reporting a bug or suggesting an improvement. Only the message is required — name and email are optional, so a report can be completely anonymous, and they're never prefilled or remembered between submissions. Nothing else is attached: no code, prompts, conversation history, file paths, workspace or model information. The report is sent from the extension host (the webview has no network access) to https://ai-gateway-beta.vercel.app/api/v1/feedback, and the dialog tells you whether it went through.

Quick Start

Option A — No API key (Auto mode)

Install NIM Code from the VS Code Marketplace.
Click the N icon in the Secondary Side Bar — the right-hand panel, toggled with Ctrl+Alt+B (or press Ctrl+Shift+N).
The mode toggle defaults to Auto — start typing and press Enter to send.

You get 5 free requests per day. The counter in the chat bar shows how many remain; once they're used up the composer is disabled until the quota resets at midnight UTC.

Option B — Full access (Chat & Agent modes)

1. Get an API key from the provider you want to use

Provider	Where to get a key	Key prefix
NVIDIA NIM (default)	build.nvidia.com → API Keys → Generate Personal Key	`nvapi-`
OpenAI	platform.openai.com → API keys	`sk-`
Anthropic	console.anthropic.com → API keys	`sk-ant-`
Google (Gemini)	aistudio.google.com → Get API key	`AIza`
Groq	console.groq.com → API Keys	`gsk_`

You only need a key for the provider(s) whose models you actually pick.

2. Install NIM Code

Search "NIM Code" in the Extensions panel (Ctrl+Shift+X) or install from the VS Code Marketplace.

3. Set your API key

Open the Command Palette (Ctrl+Shift+P / Cmd+Shift+P) and run:

NIM Code: Set API Key

Pick the provider, then paste the key when prompted. The key is validated against the provider before it's saved. Repeat for any other providers you want to use.

4. Start chatting

Click the N icon in the Secondary Side Bar (the right-hand panel), or press Ctrl+Shift+N / Cmd+Shift+N.

Modes

Mode	Icon	API key	Model	Tools	Best for
Auto	sparkles	Not required	Free routing gateway with auto-failover (always)	All 32 tools (+ MCP)	Free agentic tasks, 5/day
Chat	speech bubble	Selected model's provider	Any model (NIM, OpenAI, Anthropic, Google, Groq)	Chat only	Q&A, explanations, code review
Agent	lightning bolt	Selected model's provider	Any model (NIM, OpenAI, Anthropic, Google, Groq)	All 32 tools (+ MCP)	Multi-step tasks, file edits

Switch modes with the segmented toggle in the chat bar. The mode hint below the toggle summarises the current capabilities.

Agent Mode

Switch the mode toggle in the chat input to Agent. NIM Code will plan and execute multi-step tasks autonomously, showing each tool call as it runs.

Example prompts:

"Search for all TODO comments in .ts files and fix them one by one"
"Run pnpm run lint, find the errors with get_diagnostics, and fix them all"
"Show me git_diff, then write a conventional commit message for these changes"
"Read src/api/users.ts, add Zod input validation, and open the file when done"
"List all .test.ts files, find untested functions with search_codebase, and write the missing tests"

Tip: Use Chat mode for questions and explanations. Use Agent for tasks that require reading or changing files.

Tool-call approval modes

The Mode dropdown next to the model picker (visible in Agent mode) controls how much confirmation the agent needs before acting:

Mode	File edits (write/rename)	Deletes & terminal commands
Manual	Asks for approval	Asks for approval
Edit automatically	Applies immediately	Asks for approval
Auto	Applies immediately	Asks for approval
Plan	Blocked — read-only	Blocked — read-only

When a confirmation is required, NIM Code opens a VS Code diff editor (for file writes) or a prompt describing the action, with Accept/Reject buttons — reject and the agent reports it and adjusts its approach. Plan mode never touches the filesystem or shell; the agent investigates and proposes a plan as text instead.

Supported Models

NIM Code ships with curated models from five providers, grouped in the picker. Add any other model via the Add custom model button or Settings → NIM Code → customModels — no extension update needed.

NVIDIA NIM (default provider)

Model	ID	Context	Best for
Llama 3.1 8B (Auto mode)	Built-in proxy	128K	Free tier — no key needed
Nemotron Super 120B (default)	`nvidia/nemotron-3-super-120b-a12b`	1M	NVIDIA-tuned reasoning and code
DeepSeek V4 Flash	`deepseek-ai/deepseek-v4-flash`	131K	Fast, efficient coding assistant
Llama 3.2 11B Vision	`meta/llama-3.2-11b-vision-instruct`	131K	Understands pasted screenshots and images

OpenAI

Model	ID	Context	Best for
GPT-5	`gpt-5`	400K	Flagship reasoning
GPT-5 mini	`gpt-5-mini`	400K	Fast, cost-efficient reasoning
GPT-4.1	`gpt-4.1`	1M	Strong coding, huge context
GPT-4.1 mini	`gpt-4.1-mini`	1M	Balanced speed and capability
GPT-4o	`gpt-4o`	128K	Multimodal general-purpose
GPT-4o mini	`gpt-4o-mini`	128K	Low latency, low cost

Anthropic

Model	ID	Context	Best for
Claude Opus 4.8	`claude-opus-4-8`	1M	Most capable — long-horizon agentic work
Claude Sonnet 5	`claude-sonnet-5`	1M	Near-Opus coding quality at Sonnet cost

Google

Model	ID	Context	Best for
Gemini 2.5 Pro	`gemini-2.5-pro`	1M	Strong reasoning, huge context
Gemini 2.5 Flash	`gemini-2.5-flash`	1M	Fast, cost-efficient multimodal

Groq

Model	ID	Context	Best for
GPT-OSS 120B	`openai/gpt-oss-120b`	128K	Open-weight reasoning

Popular NIM models to add via customModels:

Model ID	Context	Best for
`meta/llama-3.3-70b-instruct`	128K	Latest Llama, strong coding
`qwen/qwen2.5-coder-32b-instruct`	32K	Code generation
`deepseek-ai/deepseek-r1`	64K	Step-by-step reasoning
`nvidia/llama-3.1-nemotron-70b-instruct`	128K	NVIDIA-tuned Nemotron 70B
`mistralai/mistral-large-2-instruct`	128K	Multilingual, strong tool use

Notes: The Auto free tier and semantic codebase indexing (embeddings) always use the NIM provider, regardless of the picked chat model. nimcode.temperature is ignored for models that reject it (Anthropic's current Claude models; OpenAI's GPT-5/o-series reasoning models).

Commands & Keyboard Shortcuts

Command	Shortcut (Win/Linux)	Shortcut (Mac)	Description
`NIM Code: Open Chat`	`Ctrl+Shift+N`	`Cmd+Shift+N`	Open or focus the NIM Code chat panel
`NIM Code: New Chat Session`	`Ctrl+Shift+L`	`Cmd+Shift+L`	Start a fresh conversation
`NIM Code: Set API Key`	—	—	Pick a provider (NVIDIA NIM, OpenAI, Anthropic, Google, Groq) and save its key securely
`NIM Code: Clear API Key`	—	—	Pick a provider and remove its stored key
`NIM Code: Generate Commit Message`	—	—	Fill the SCM commit box from the current diff
`NIM Code: Index Codebase for Semantic Search`	—	—	Build the local embedding index for `search_codebase_semantic`
`NIM Code: Install Skill from URL`	—	—	Fetch a skill `.md` from a URL into `.nimcode/skills/`
`NIM Code: Review GitHub PR`	—	—	Review a PR via `gh` and (after confirmation) post the review as a comment
Explain / Fix / Review / Refactor / Generate Tests / Document	— (right-click or `Ctrl+.`)	—	Run a slash command on the current editor selection
Send message	`Enter`	`Enter`	Send the typed message
New line in message	`Shift+Enter`	`Shift+Enter`	Insert a line break without sending
Browse history	`↑` / `↓`	`↑` / `↓`	Navigate previously sent messages

All commands are also accessible via the Command Palette (Ctrl+Shift+P / Cmd+Shift+P) — search "NIM Code".

Configuration

Open Settings (Ctrl+,) and search "NIM Code", or edit settings.json:

{
  // Base URL for the NIM provider's OpenAI-compatible API — change for
  // on-premise deployments or local servers (Ollama, LM Studio, vLLM, …)
  "nimcode.baseUrl": "https://integrate.api.nvidia.com/v1",

  // Per-provider base URL overrides. openaiBaseUrl / groqBaseUrl are handy for
  // OpenAI-compatible gateways; leave anthropicBaseUrl / googleBaseUrl empty to
  // use each provider SDK's default endpoint.
  "nimcode.openaiBaseUrl": "https://api.openai.com/v1",
  "nimcode.anthropicBaseUrl": "",
  "nimcode.googleBaseUrl": "",
  "nimcode.groqBaseUrl": "https://api.groq.com/openai/v1",

  // Mode the chat panel starts in when it is first opened ("auto" | "chat" | "agent").
  // Switching modes mid-session is not persisted — a fresh panel starts here again.
  "nimcode.defaultChatMode": "auto",

  // Fallback default model, overridden by defaultChatModel / defaultAgentModel
  "nimcode.defaultModel": "nvidia/nemotron-3-super-120b-a12b",

  // Model selected automatically when Chat mode is active
  "nimcode.defaultChatModel": "nvidia/nemotron-3-super-120b-a12b",

  // Model selected automatically when Agent mode is active — use a strong tool-use model
  "nimcode.defaultAgentModel": "nvidia/nemotron-3-super-120b-a12b",

  // Per-request timeout in milliseconds (default: 120 000 = 2 min)
  // Increase to 300 000 for reasoning models like DeepSeek R1
  "nimcode.requestTimeoutMs": 120000,

  // Automatic retry attempts on transient errors (0–5)
  "nimcode.maxRetries": 3,

  // Max time in milliseconds run_terminal lets a shell command run before
  // killing it (default: 600 000 = 10 min). Set to 0 for no timeout — the
  // command then only stops on exit or user cancellation. A "still running"
  // heartbeat is posted to the chat every 15s while a command is in flight.
  "nimcode.terminalTimeoutMs": 600000,

  // Sampling temperature — lower = more deterministic (0–2). Ignored for
  // models that reject it (Anthropic Claude; OpenAI GPT-5/o-series).
  "nimcode.temperature": 0.2,

  // In Agent mode, auto-include the active file (path + cursor snippet) as
  // context when no explicit selection is attached (default: true)
  "nimcode.agentIncludeActiveFile": true,

  // Embedding model used to build the local semantic codebase index
  // ("NIM Code: Index Codebase" command) and the search_codebase_semantic tool
  "nimcode.embeddingModel": "nvidia/nv-embedqa-e5-v5",

  // After the agent writes/edits a file, auto-collect its errors/warnings from
  // language servers and feed them back to the model (default: true)
  "nimcode.autoDiagnostics": true,

  // Auto-retrieve the most relevant indexed code snippets as context for each
  // Agent request; requires "NIM Code: Index Codebase" (default: true)
  "nimcode.autoContext": true,

  // Cache the stable prompt prefix (system prompt, project memory, tools) to cut
  // cost/latency on repeated calls. Adds a cache_control breakpoint for Anthropic;
  // OpenAI caches automatically regardless. Cached tokens show as "(N cached)". (default: true)
  "nimcode.promptCaching": true,

  // One anonymous heartbeat per active day (machine id, extension version,
  // VS Code version, OS — never code, prompts, paths, or keys) so we can count
  // active users. VS Code's own telemetry.telemetryLevel always wins. (default: true)
  "nimcode.telemetry.enabled": true,

  // Per-workspace, team-shared project memory (.nimcode/memory.md):
  // the project_memory tool, the memory panel, and prompt injection (default: true)
  "nimcode.projectMemory.enabled": true,

  // Auto-distill durable decisions/TODOs into .nimcode/memory.md when a long
  // session is compacted. Off by default — it writes a git-committed file.
  "nimcode.projectMemory.autoCapture": false,

  // Additional models shown in the picker. "provider" selects which API
  // serves the model: "nim" (default — the endpoint at nimcode.baseUrl),
  // "openai", "anthropic", "google", or "groq".
  "nimcode.customModels": [
    {
      "id": "meta/llama-3.3-70b-instruct",
      "label": "Llama 3.3 70B",
      "contextWindow": 128000,
      "capabilities": ["chat", "tools"]
    },
    {
      "id": "gpt-4o-2024-11-20",
      "label": "GPT-4o (pinned)",
      "provider": "openai",
      "contextWindow": 128000,
      "capabilities": ["chat", "tools", "vision"]
    }
  ],

  // Shell commands run around Agent-mode events (machine-scoped).
  // Exit code 2 = blocking; the hook's stderr is sent back to the model.
  "nimcode.hooks": {
    // Block edits to lock files; guard is a script that exits 2 to deny.
    "PreToolUse": [
      {
        "matcher": "write_file|edit_file|delete_file",
        "hooks": [{ "type": "command", "command": "node scripts/guard-edit.js" }]
      }
    ],
    // Auto-format every file the agent writes.
    "PostToolUse": [
      {
        "matcher": "write_file|edit_file",
        "hooks": [{ "type": "command", "command": "prettier --write \"$NIMCODE_PROJECT_DIR\"", "timeout": 30 }]
      }
    ],
    // Don't let the agent stop until the type-checker is clean.
    "Stop": [
      {
        "hooks": [{ "type": "command", "command": "pnpm run typecheck 1>&2 || exit 2" }]
      }
    ]
  }
}

Each hook command receives the event payload as JSON on stdin and these environment variables: NIMCODE_HOOK_EVENT (PreToolUse / PostToolUse / Stop), NIMCODE_TOOL_NAME (the tool being called), and NIMCODE_PROJECT_DIR (the workspace root, also the command's working directory). Exit 0 for success, exit 2 to block (stderr is forwarded to the model); any other non-zero exit is logged but does not affect the run.

On-premise / self-hosted NIM

"nimcode.baseUrl": "http://localhost:8000/v1"

Everything else — API key, model IDs, streaming, tool use — works identically against a local NIM deployment.

Moving the Panel

NIM Code opens in the Secondary Side Bar — the right-hand panel, like GitHub Copilot Chat — which you can show or hide with Ctrl+Alt+B / Cmd+Alt+B. To move it somewhere else:

Right-click the N icon in the Secondary Side Bar
Pick another location (e.g. Move to Primary Side Bar to dock it on the left), or just drag the icon where you want it

VS Code remembers this permanently.

Privacy & Security

Concern	How NIM Code handles it
API keys	Each provider's key (NVIDIA NIM, OpenAI, Anthropic, Google, Groq) is stored in the OS keychain (VS Code SecretStorage). Never written to disk or settings files.
Auto mode key	Baked into the extension bundle at build time (from the build environment, never committed to source). Routed only through the NIM Code gateway, which fails over across providers — your traffic is not logged or stored.
Chat data	Sent only to the selected model's provider: `integrate.api.nvidia.com` (or your custom `baseUrl`), `api.openai.com`, `api.anthropic.com`, `generativelanguage.googleapis.com`, or `api.groq.com`. Never sent anywhere else — in particular, never included in telemetry.
Telemetry	One anonymous heartbeat per active day, for counting active users only. See Telemetry below — opt out with `nimcode.telemetry.enabled`.
Feedback form	Sent only when you fill it in and press Send. Contains exactly what you typed — the message, plus a name and email only if you chose to enter them. Nothing is prefilled, stored, or attached automatically.
Webview	Runs under a strict Content Security Policy — no external network requests from the UI layer.
Agent file access	All file operations are scoped to your VS Code workspace root.

Telemetry

On each day you actually use NIM Code, the extension sends one anonymous event so we can tell how many people are using it and which versions are in the wild. It contains only:

VS Code's own anonymized machine id (vscode.env.machineId — not tied to you, your account, or your email)
the NIM Code version
the VS Code version
your operating system (win32 / darwin / linux)

It never includes your code, prompts, responses, file or workspace paths, project or repository names, model ids, or API keys. It is capped at one request per machine per day, is sent fire-and-forget, and can never delay or fail a chat.

To turn it off, either:

"nimcode.telemetry.enabled": false

or set VS Code's global telemetry.telemetryLevel to "off" — that always takes precedence, so if you have VS Code telemetry disabled, NIM Code sends nothing regardless of its own setting.

Troubleshooting

"Set your … API key" banner This banner appears in Chat or Agent mode when the selected model's provider has no key saved — e.g. picking a Claude model without an Anthropic key. Click Set Key (or run NIM Code: Set API Key and pick the provider), or switch to Auto mode to start chatting immediately with no key.

401 / authentication errors on one provider only Keys are per-provider. Run NIM Code: Set API Key, pick the failing provider, and re-enter its key — the error message names which provider rejected the request.

Auto mode daily limit reached The 5-request counter resets at midnight UTC; until then the composer is disabled. Switch to Chat mode with your own NVIDIA NIM API key for unlimited usage.

Slow or no response with large models (Nemotron Super 120B) Increase nimcode.requestTimeoutMs to 300000 in Settings — large models can take several minutes on complex prompts.

Rate limit errors NIM Code retries automatically with exponential backoff. Persistent errors indicate you have reached your NVIDIA NIM free-tier limit.

search_codebase fails NIM Code uses VS Code's bundled ripgrep (rg) binary — no separate installation is needed.

Agent writes unexpected content Use Chat mode with /review first to validate the model understands your codebase before switching to Agent.

Requirements

VS Code 1.90.0 or later
An API key for the provider you want to use — NVIDIA NIM, OpenAI, Anthropic, Google, or Groq (Auto mode requires no key)
Node.js 20+ (development only)

Contributing

Bug reports and feature requests: vjwarboy13@gmail.com

Local development

pnpm install        # install all dependencies
pnpm run build      # compile extension + webview
# Press F5 in VS Code to launch the Extension Development Host

For incremental development, run these in two separate terminals:

pnpm run watch:webview      # Vite --watch for the React UI
pnpm run watch:extension    # esbuild --watch for the extension host

Builds carry the PostHog project key by default, so an Extension Development Host session counts as an active user like any install. To keep your own machine out of the numbers, either set "nimcode.telemetry.enabled": false in your user settings, or build with the key blanked:

POSTHOG_KEY= pnpm run build      # PowerShell: $env:POSTHOG_KEY = ''; pnpm run build

Scripts

Script	Purpose
`pnpm run build`	Full production build (webview + extension host)
`pnpm run watch:webview`	Rebuild webview on file changes
`pnpm run watch:extension`	Rebuild extension host on file changes
`pnpm run typecheck`	Type-check both tsconfigs without emitting
`pnpm run lint`	Run ESLint across all source files
`pnpm run format`	Auto-format with Prettier
`pnpm run test`	Run unit tests (Vitest)
`pnpm run test:e2e`	Run end-to-end tests (Playwright)
`pnpm run rebuild:electron`	Rebuild the native `better-sqlite3` module for VS Code's Electron ABI — run before `F5`
`pnpm run rebuild:node`	Rebuild `better-sqlite3` for plain Node — run before `pnpm run test`
`pnpm run package`	Build and pack a `.vsix` installable file
`pnpm run publish`	Publish to the VS Code Marketplace

Native module note: better-sqlite3 is a compiled addon. Vitest runs under plain Node, while the Extension Development Host (F5) runs under VS Code's bundled Electron — a different Node ABI. If F5 throws command 'nimcode.*' not found for every command, the native binary is almost certainly built for the wrong ABI; run pnpm run rebuild:electron and relaunch. See CLAUDE.md for details.

NIM Code

vijay janakiraman

NIM Code

In one line

Key points

How it works

Features

Auto mode — free agent tier, no key needed

Multi-provider, multi-model chat

Agent mode

Smarter tool use

Sub-agent delegation

Right-click code actions

MCP (Model Context Protocol) support

Lifecycle hooks

Tool-call approval & Plan mode

Planner Mode (plan → approve → execute)

Background agents

Task queue (run tasks sequentially)

Live agent task list

Reasoning effort control

Semantic codebase search

Memory

Image attachments (vision)

Auto-attached editor selection

AI-generated commit messages

GitHub PR review (post comments)

Slash commands

Skills (reusable prompt packages)

Streaming responses

Copy code blocks

Nim macro & template expansion

Mermaid diagram rendering

Token budget indicator

Per-message token usage & cost

Prompt caching

Auto-compaction of long conversations

Secure API key storage

Persistent sessions

Feedback & suggestions

Quick Start

Option A — No API key (Auto mode)

Option B — Full access (Chat & Agent modes)

1. Get an API key from the provider you want to use

2. Install NIM Code

3. Set your API key

4. Start chatting

Modes

Agent Mode

Tool-call approval modes

Supported Models

NVIDIA NIM (default provider)

OpenAI

Anthropic

Google

Groq

Commands & Keyboard Shortcuts

Configuration

On-premise / self-hosted NIM

Moving the Panel

Privacy & Security

Telemetry

Troubleshooting

Requirements

Contributing

Local development

Scripts

License