Luna Code

AI Coding Agent for VS Code

Luna Code is an agentic coding assistant that runs entirely on OpenRouter. Point it at any model OpenRouter supports with just an API key and a model id — no other configuration required.

It's built for fast codebase navigation, efficient agentic sessions, and prompt-cache hit optimization so long sessions stay cheap and fast.

Features

One-line setup. Paste an OpenRouter API key, pick a model, go.
Three modes
- Standard — approve each file edit and shell command before it runs.
- Auto — runs edits and commands autonomously without prompting (commands in lunacode.alwaysDenyCommands are still hard-blocked).
- Plan — read-only research + planning. The agent investigates the code and proposes a concrete plan, but can't edit files or run mutating commands.
Session history. Every conversation is saved per-workspace. Click the history button in the header to browse, reload, or delete prior sessions.
Usage & cost analytics. A live meter shows session-total cost plus the last turn's tokens and cache-hit rate. The usage window (bar-chart button) reports spend and tokens over the last 30 / 60 / 90 days, with a daily cost chart, daily token chart, and a per-model cost/usage breakdown.
Refined "thinking" UX. While the model reasons, an animated Thinking… indicator sits just above the composer; when it finishes it collapses to a quiet "Thought for Ns" marker — no noisy, expandable reasoning blocks.
Agentic tool loop. Reads files, lists/globs/greps the workspace, runs builds & tests, checks language-server diagnostics, and edits code — looping until the task is done. Independent read-only lookups batched into one response run in parallel, and after every edit the file's language-server errors are auto-appended to the tool result so the model fixes breakage without an extra round-trip.
Explore sub-agents. An explore tool delegates open-ended research ("how does auth work here?") to disposable sub-agents with their own context (lunacode.subagentModel, cheap model recommended — blank uses the selected model). Pass questions to fan out independent topics in parallel. Only the digests return to the main conversation, keeping it small and cache-friendly. Each run also gets an expandable UI-only report with source files, tool mix, iterations, duration, tokens, cache rate, and cost—without bloating model context.
Loop guard. Soft-blocks runaway rewrite loops (same file/command mutated too many times in one turn) so the model can adapt; hard-stops only after consecutive fully-blocked rounds. Paged read_file ranges are never treated as duplicates.
Surgical reads. A file_outline tool (language-server symbols with line ranges) plus read_file offset/limit paging — the agent pulls 40 relevant lines instead of whole files.
Project memory. A LUNA.md at the workspace root is loaded into the system prompt every session; the agent is instructed to record durable conventions and gotchas there.
Turn checkpoints. Files changed by an agent turn are snapshotted; an ↩ revert chip in the meter restores the last turn's edits (stack of 10).
MCP servers. Connect stdio Model Context Protocol servers via lunacode.mcpServers (or the settings GUI); their tools appear to the agent as mcp__<server>__<tool> with approval gating in Standard mode.
Cache-warmth dot. The meter shows whether the provider prompt cache is still warm (~5 min TTL) — a cold cache means the next message re-writes it at full input price. Optional pre-warm (lunacode.prewarmCache) writes the cache when a session opens so the first message starts warm.
Mid-turn steering. Messages sent while the agent is working are injected into the running task at the next step — no waiting for the turn to end.
Live task checklist. For multi-step work the agent maintains a visible plan (set_tasks) rendered above the composer with per-step progress.
Eager tool execution. Read-only tool calls start running while the model is still streaming the rest of its response, overlapping generation with I/O.
Resilient streaming. Automatic retry with backoff on 429/5xx before any tokens arrive, a hung-stream watchdog, and optional fallback models (lunacode.fallbackModels) via OpenRouter routing — a status line notes when a fallback served the response.
@-file mentions. Type @ in the composer for a fuzzy file picker; the chosen path is inserted so the agent reads exactly the file you meant.
Turn review & commit. The ± chip shows side-by-side diffs of the last turn's edits with one-click git commit (message generated by the cheap summarizer model); ↻/✎ chips retry or edit-and-resend your last message.
Context inspector. Click the session cost in the meter to see exactly what's in the context window: totals vs budget, system-prompt size, the largest items, and the estimated cost of the next (cached) call.
Atomic batched patches. apply_patch preflights every requested change before writing, supports exact replacements and line ranges, and edits one or many files in a single model round-trip.
Revision-safe edits. read_file returns a compact content revision; edit_file, write_file, and apply_patch can reject stale changes when a user, formatter, or another agent modified the file after it was read.
Conflict-safe implementers. Scoped implement.jobs enforce declared write paths and run concurrently only when every scope is disjoint. Unscoped or overlapping work is serialized automatically. Every implementer runs in a disposable Git worktree based on the caller's current tracked state; Luna snapshots every affected file and merges only the completed binary patch.
Engineering Control Center. The sliders button (or Luna Code: Open Control Center) combines budget projections, durable queue controls, interrupted-run recovery, verification gates, the live agent graph, repository hotspots, tool-schema measurements, project memory, worktree actions, and the security audit in one responsive surface.
Patch Studio and intent-level undo. Review the complete turn change set, open native editable diffs, revert individual files or selected hunks, or rewind the entire logical turn—including patches produced by implementers.
Verification policies. Advisory, Standard, and Strict gates evaluate diagnostics, tests/builds, and tool failures from actual turn receipts. Test-first bug-fix guidance plus /regression captures red→green evidence.
Crash-safe background work. Queued prompts, active-run markers, and audit state persist with the session. Interrupted work resumes with an explicit inspect-first continuation instead of blindly replaying side effects.
Model tournaments. The tournament tool runs two independent candidate analyses on distinct configured routes when possible, then asks the parent model to judge and synthesize them. Use /tournament <decision> directly.
Repository intelligence. Control Center maps languages, entrypoints, module sizes, test density, high-change files, and architectural risk from the current repository and durable receipts.
Security and trust audit. Approvals, commands, writes, sandbox operations, recovery, denials, and failures are durable session evidence. Common credential formats are redacted from command output; sensitive file reads require parent approval and are blocked inside delegated agents.
Persistent turn receipts. Every completed turn records changed files, commands, validation evidence, delegation, failures, tokens, cache rate, schema tax, cost, and duration. Receipts survive session reloads.
Cost profiles and forecasts. Economy, Balanced, and Quality presets tune reasoning, provider routing, context carry cost, tool loading, and sub-agent budgets. Delegation cards show a cost ceiling before work starts.
Benchmark harness. npm run benchmark reports fixed tool-schema tax. Export real receipt data with Luna Code: Export Benchmark Metrics, then run npm run benchmark -- path/to/metrics.json for cost, latency, cache, tool, failure, and throughput measurements.
Background processes. start_process / read_process / stop_process let the agent run a dev server, probe it, read the logs, and iterate.
Session budget guardrail. lunacode.sessionBudgetUsd pauses the agent (even in Auto mode) and asks before spending past your limit.
Editor-aware. Each message can carry your active file + selection (lunacode.includeActiveFile); right-click menu adds Fix Problems in This File, Refactor Selection…, and Explain Selection; and every diagnostic's lightbulb offers Fix with Luna Code. Multi-root workspaces pick their working folder via Select Working Folder.
Slash commands. /commit, /review, /tests, /regression, and /tournament built in, plus your own templates via lunacode.customCommands — with autocomplete in the composer.
Image paste. Paste screenshots into the composer (up to 3, multimodal models via OpenRouter).
Worktree sandbox. lunacode.worktreeMode runs the agent in a separate git worktree; merge or discard its changes via the command palette.
Format after edit. Optionally run the workspace formatter on every file the agent touches (lunacode.formatAfterEdit).
Calm, readable streaming. Scrolling up pauses auto-follow; long code blocks are height-capped with click-to-expand; a live ~N tok counter shows progress during long silent generations; and an actions menu (⋯) gathers review/revert/retry/edit/export with plain-text labels.
Live tool output. Commands stream their stdout into the tool card as they run (last few lines, click for the full log), background processes show their startup output, and the explore sub-agent's lookups stream into its card so its research is visible.
Monorepo memory. Nested LUNA.md files in subdirectories load alongside the root one, each labeled with its path.
Cache-hit optimized. A stable system-prompt prefix plus rolling cache_control breakpoints maximize provider prompt caching (Anthropic / Gemini via OpenRouter; automatic for OpenAI). The composer shows a live cache hit % and token/cost meter.
Context management. Cache-aware compaction: history stays append-only (so prompt-cache hits keep landing) until a price-aware budget is hit — sized from the model's context window and its input price, so a fully cached turn stays under a target cost (lunacode.autoBudgetCarryCostUsd). A compaction event then supersedes stale duplicate file reads and replaces the oldest turns with a structured checkpoint summary written by a cheap summarizer model (lunacode.summarizerModel), driving the context down to a floor (lunacode.compactionTargetRatio) so events stay rare.
Settings GUI. A gear button in the panel header opens an in-chat settings sheet — models, context/cost budgets, generation, privacy routing, and command allow/deny lists — with instant apply and two-way sync with VS Code's settings editor.
Modern UI. A clean neutral-dark interface with purple accents, streaming responses, collapsible reasoning, tool cards, and inline diff approvals.
Open anywhere. Use Luna Code in the Activity Bar sidebar or pop it out into an editor tab (button in the panel title bar). To dock it on the right like Claude Code, drag the Luna Code icon into the Secondary Side Bar — VS Code remembers the placement. All surfaces share one live session.
Private by default. Every request sends OpenRouter provider.data_collection: "deny", so traffic is only routed to providers that do not store or train on your prompts. An optional stricter Zero-Data-Retention (ZDR) mode is available.
Secure. Your API key is stored in VS Code's encrypted SecretStorage, never in settings or files.

Getting started

Build the extension:
```
npm install
npm run compile
```
Press F5 in VS Code to launch the Extension Development Host.
Click the Luna Code icon in the Activity Bar.
Click Set OpenRouter API Key and paste your key (sk-or-v1-…).
Click the model chip in the header to pick a model (or browse all OpenRouter models).
Type a request and hit Enter.

Keyboard shortcuts

Shortcut	Action
`Ctrl/Cmd + Shift + L`	Focus the Luna Code chat
`Ctrl/Cmd + Shift + K`	Add the current editor selection to chat

Configuration

All settings live under the lunacode.* namespace (Settings → Extensions → Luna Code):

Setting	Default	Description
`lunacode.model`	`z-ai/glm-5.2`	OpenRouter model id (use the picker / Browse all for current ids).
`lunacode.baseUrl`	`https://openrouter.ai/api/v1`	API base URL (override for proxies).
`lunacode.defaultMode`	`standard`	`standard` \| `auto` \| `plan`.
`lunacode.maxTokens`	`0`	Max tokens per turn. `0` = use the model's full output limit (avoids truncating large `write_file` calls).
`lunacode.temperature`	`0`	Sampling temperature.
`lunacode.enablePromptCaching`	`true`	Insert `cache_control` breakpoints.
`lunacode.dataCollection`	`deny`	`deny` routes only to providers that don't store/train on prompts; `allow` permits all.
`lunacode.zeroDataRetention`	`false`	Stricter: only route to Zero-Data-Retention endpoints.
`lunacode.maxContextTokens`	`180000`	Budget before older context is compacted.
`lunacode.autoApproveCommands`	common read-only cmds	Auto-approved even in Standard mode.
`lunacode.alwaysDenyCommands`	destructive cmds	Always blocked, any mode.
`lunacode.verificationPolicy`	`standard`	`advisory` \| `standard` \| `strict` receipt gates.
`lunacode.testFirstFixes`	`true`	Prefer a failing regression test before production bug fixes.
`lunacode.durableQueue`	`true`	Persist queued work and interrupted-run recovery markers.
`lunacode.worktreeMode`	`false`	Isolate the main agent; implementer subagents use independent worktrees automatically.

Tools the agent can use

Tool	Mutating	Purpose
`read_file`	no	Read a file (with paging).
`list_dir`	no	List a directory.
`glob`	no	Find files by glob pattern.
`grep`	no	Regex search across the workspace.
`get_diagnostics`	no	Read language-server errors/warnings.
`explore`	no	Delegate bounded repository research.
`tournament`	no	Produce two independent candidates for parent-model judgment.
`implement`	yes	Run a scoped implementer in an isolated worktree and merge its patch.
`write_file`	yes	Create/overwrite a file.
`edit_file`	yes	Exact-string targeted edit.
`run_command`	yes	Run a shell command (PowerShell on Windows, sh elsewhere).

Mutating tools are hidden entirely in Plan mode and gated by approval in Standard mode.

How cache optimization works

OpenRouter forwards cache_control breakpoints to providers that support prompt caching. Luna Code:

Keeps the system prompt + tool definitions byte-stable across a session and marks the end of the system prompt as a cache breakpoint.
Places a rolling breakpoint on the latest message each request so the entire accumulated conversation becomes a cached prefix for the next call.
Only ever appends to the message list, never reorders, so prefixes stay valid for cache reuse.

For OpenAI models, caching is automatic and these hints are safely ignored.

Project structure

src/
  extension.ts            activation + commands
  config.ts               settings + SecretStorage for the API key
  modes.ts                Standard / Auto / Plan definitions
  openrouter/
    client.ts             streaming Chat Completions client
    types.ts              message + cache_control types
  agent/
    agent.ts              the agentic tool loop
    systemPrompt.ts       system prompt (stable cache prefix)
    contextManager.ts     cache breakpoints + context compaction
    tools/                read/write/edit/list/glob/grep/run/diagnostics
  webview/
    provider.ts           webview host + approval bridge
    protocol.ts           host <-> webview message types
    ui/                   the webview front-end (main.ts, markdown.ts)
media/
  webview.css             the dark-purple theme

License

MIT

Luna Code — AI Coding Agent

Derek Morris