Skip to content
| Marketplace
Sign in
Visual Studio Code>Machine Learning>Bespoke AI — Autocomplete and One-Click Commit MessagesNew to Visual Studio Code? Get it now.
Bespoke AI — Autocomplete and One-Click Commit Messages

Bespoke AI — Autocomplete and One-Click Commit Messages

Trent McNitt

|
70 installs
| (3) | Free
AI autocomplete for prose and code — use your Claude subscription or bring your own API key (xAI, OpenAI, Anthropic, Gemini, Ollama)
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Bespoke AI

Bespoke AI

AI autocomplete for prose and code in VS Code — powered by your Claude subscription or API key

VS Code Marketplace License: MIT

Early release — Please bear with the rough edges. 👷 Expect occasional bugs and fluctuating (or poor) quality in some scenarios. Open an issue if you run into any problems.

💻 macOS, Linux, and Windows — Also works in VSCodium.

🖊️ Writing, not just code — Inline completions for prompts, journals, notes, docs, and code. Matches your voice and style in prose; language-aware in code.

🔑 Subscription or API key — Use your Claude subscription through Claude Code (no per-token costs), or bring your own API key from xAI, OpenAI, Anthropic, Google, OpenRouter, or Ollama.

✨ One-click commit messages — Hit the sparkle button in Source Control to generate a commit message from your staged diffs.

✏️ Suggest edits — One-command typo, grammar, and bug fixes for visible text.

🔧 Context menu — Right-click to Explain, Fix, or Do custom actions on selected text. (Requires Claude Code backend.)

Screenshots

Writing — autocompleting a prompt to Claude:

Writing completion demo

Code — filling in a function body:

Code completion demo

Commit messages — the sparkle button in Source Control:

Commit message sparkle button

🚀 Getting Started

  1. Install from the VS Code Marketplace (search for "Bespoke AI")
  2. On first launch, the extension detects whether you have the Claude Code CLI — if not, it offers to help you set it up or switch to API key mode
  3. Start typing — completions appear as gray suggestion text after a ~2-second pause

That's it. The extension walks you through setup. Details for each path below if you need them.

Claude subscription — no per-token costs, uses your existing Claude Pro/Team/Enterprise plan
  1. Install Claude Code: curl -fsSL https://claude.ai/install.sh | bash (macOS/Linux) or npm install -g @anthropic-ai/claude-code (any platform)
  2. Run claude in your terminal and follow the login prompts
  3. Reload VS Code — the extension detects the CLI automatically

The default model is Sonnet — best balance of quality and speed. Switch models anytime via the status bar menu.

API key — pay per token with xAI, OpenAI, Anthropic, Google, OpenRouter, or Ollama
  1. Choose "Use an API key instead" when prompted (or set bespokeAI.backend to api in settings)
  2. Run Bespoke AI: Enter API Key from the Command Palette (Ctrl/Cmd+Shift+P) to store your key securely

The default model is Grok 4.1 Fast by xAI — fast, affordable, and high quality. Change models anytime via the status bar menu. See Available Models for the full list.

Triggering Completions

Completions appear automatically after a ~2-second pause (the relaxed preset). Press Alt+Enter to trigger one instantly at any time.

If Alt+Enter doesn't work, another keybinding is likely intercepting it. Open Keyboard Shortcuts (Ctrl/Cmd+K Ctrl/Cmd+S), search for alt+enter, and rebind any conflicts — Inline Chat is the most common one.

Change trigger behavior via the status bar menu: relaxed (~2s), eager (~800ms), or on-demand (Alt+Enter only).

Modes

Mode Activates for Strategy
Writing markdown, plaintext, latex, restructuredtext, and others Continuation-style. Matches voice, style, and format.
Code All recognized programming languages Uses code before and after your cursor, language-aware.
Auto (default) — Selects Writing or Code based on file type.

Override via settings or the status bar menu.

💡 Why This Exists

I tried every open-source AI autocomplete extension I could find. Most handled code fine but fell apart with prose — breaking paragraphs mid-thought, injecting code syntax into journal entries, producing gibberish outside of source files. Nothing came close to Copilot for non-code text.

So I built my own. And since I was already paying for a Claude subscription, I realized I could wire it up to use Claude Code instead of raw API calls — getting frontier model completions (Haiku, Sonnet, even Opus) at no additional per-request cost. Built on the Claude Agent SDK, it took extensive prompt engineering, but the result handles writing just as well as code.

The Claude Code backend uses your existing subscription (Pro, Team, or Enterprise). Heavy use may be subject to Anthropic's rate limits. The API backend uses standard per-token pricing from your chosen provider.

🧩 Available Models

The API backend includes presets for popular providers. Change the active preset via the status bar menu or the bespokeAI.api.preset setting.

Preset Provider Model
xai-grok (default) xAI grok-4-1-fast-non-reasoning
xai-grok-code xAI grok-code-fast-1
xai-grok-4 xAI grok-4-0709
anthropic-haiku Anthropic claude-haiku-4-5
anthropic-sonnet Anthropic claude-sonnet-4-5
openai-gpt-4.1-nano OpenAI gpt-4.1-nano
openai-gpt-4o-mini OpenAI gpt-4o-mini
google-gemini-flash Google gemini-2.5-flash
openrouter-haiku OpenRouter anthropic/claude-haiku-4.5
openrouter-gpt-4.1-nano OpenRouter openai/gpt-4.1-nano
ollama-default Ollama (local, free) qwen2.5-coder:7b
ollama-qwen3-4b Ollama (local, free) qwen3:4b
ollama-qwen3-8b Ollama (local, free) qwen3:8b

API keys: Store keys via the Enter API Key command (Command Palette → "Enter API Key"). Keys are saved in your OS keychain. As a fallback, the extension also checks environment variables and ~/.creds/api-keys.env:

xAI → XAI_API_KEY · Anthropic → ANTHROPIC_API_KEY · OpenAI → OPENAI_API_KEY · Google → GEMINI_API_KEY · OpenRouter → OPENROUTER_API_KEY · Ollama → no key needed

Custom models: Run Bespoke AI: Add Custom Model from the Command Palette for a guided setup wizard. Any OpenAI-compatible endpoint works (LM Studio, Together, Mistral, etc.), plus Anthropic, Google, and OpenRouter. You can also add presets manually:

"bespokeAI.api.customPresets": [
  { "name": "My Llama", "provider": "openai-compat", "modelId": "llama3.2", "baseUrl": "http://localhost:1234/v1" }
]

⚙️ Configuration

All settings live under bespokeAI.* in VS Code settings.

General
Setting Default Description
enabled true Master on/off toggle
mode "auto" Completion mode (auto-detects)
triggerPreset "relaxed" Trigger preset: relaxed (~2s), eager (~800ms), on-demand (Alt+Enter only)
debounceMs 2000 Override the debounce delay from your trigger preset
logLevel "info" Logging verbosity in Output channel
Backend
Setting Default Description
backend "claude-code" Active backend: claude-code (CLI) or api (HTTP)
api.preset "xai-grok" Active API model (dropdown in settings, or status bar menu)
Model (Claude Code backend)
Setting Default Description
claudeCode.model "sonnet" Active model (sonnet, haiku, opus, etc.)
claudeCode.models ["haiku", "sonnet", "opus"] Available models catalog
Context Windows
Setting Default Description
prose.contextChars 2500 Prefix context (characters) for writing
prose.suffixChars 2000 Suffix context (characters) for writing
prose.fileTypes [] Additional language IDs to treat as writing
code.contextChars 2500 Prefix context (characters) for code
code.suffixChars 2000 Suffix context (characters) for code
Code Override

Route code completions to a different backend or model than prose. For example, use Claude Code CLI for writing and an xAI preset for code — or vice versa.

Setting Default Description
codeOverride.backend "" Backend for code files: claude-code, api, or empty (use global default)
codeOverride.model "" Model for code files. CLI: model name (e.g. haiku). API: preset ID (e.g. xai-grok-code). Empty = default.
Context Menu Permissions
Setting Default Description
contextMenu.permissionMode "default" Permission mode for Explain, Fix, Do

Options:

  • default — Ask before every action (safest)
  • acceptEdits — Auto-approve file reads and edits
  • bypassPermissions — Skip all permission checks (use with caution)

📋 Commands

Command Keybinding Description
Trigger Completion Alt+Enter Manually trigger a completion
Toggle Enabled — Toggle the extension on/off
Cycle Mode — Cycle through auto → prose → code
Clear Completion Cache — Clear the LRU cache
Show Menu — Status bar menu
Generate Commit Message — AI commit message from staged diffs
Suggest Edits — Fix typos/grammar/bugs in visible text
Explain / Fix / Do — Context menu actions on selected text
Enter API Key — Store an API key in the OS keychain
Remove API Key — Remove a stored API key
Add Custom Model — Guided wizard to add a custom API model
Remove Custom Model — Remove a custom API model
Restart Pools — Restart Claude Code subprocesses
Architecture
User types → Mode detection → Context extraction → LRU cache check
  → Debounce → Backend Router → Claude Code CLI or API → Cleanup → Ghost text

A backend router dispatches requests to the active backend. The Claude Code CLI backend uses the Claude Agent SDK and manages subprocesses through a shared pool server — multiple VS Code windows share subprocesses via IPC (Unix sockets on macOS/Linux, named pipes on Windows). The API backend makes direct HTTP calls to Anthropic, OpenAI-compatible (OpenAI, Google Gemini, xAI, OpenRouter), or local Ollama endpoints.

All backends share the same prompt strategy ({{FILL_HERE}} marker, <COMPLETION> tags) with backend-specific extraction (prefill for Anthropic API, preamble stripping for OpenAI-compat).

Key design decisions:

Decision Rationale
Writing-first defaults Unrecognized languages fall back to writing, not code
Dual backend Claude Code CLI for subscribers, API for key-based access
Shared prompts Same system prompt across all backends — only extraction differs
No streaming VS Code's inline completion API requires complete strings
LRU cache (50 entries) 5-minute TTL prevents redundant calls when revisiting positions
Session reuse (CLI) One subprocess serves many requests — avoids cold-start per call

🔍 Troubleshooting

Completions not appearing?

  • Check the status bar — is it showing "AI Off"? Click to re-enable.
  • If it shows "Setup needed", the Claude Code CLI may not be installed or authenticated.
  • Check if trigger preset is "on-demand" — in that mode, press Alt+Enter to trigger.
  • Open the Output panel ("Bespoke AI") and check for errors.
  • Run the "Bespoke AI: Restart Pools" command from the Command Palette.

"Claude Code CLI not found"?

  • Install Claude Code: curl -fsSL https://claude.ai/install.sh | bash (macOS/Linux) or npm install -g @anthropic-ai/claude-code (any platform)
  • Restart VS Code after installation.

"Authentication required"?

  • Run claude in your terminal and follow the login prompts.
  • Ensure you have an active Claude subscription (Pro, Team, or Enterprise).

Windows: Explain/Fix/Do commands garbled?

  • Context menu commands use bash-style shell escaping. On Windows with PowerShell or cmd.exe, text containing $, backticks, or special characters may not pass through correctly.
  • Workaround: set your VS Code terminal to Git Bash or WSL.

Still not working?

  • Check for orphaned processes:
    • macOS/Linux: pkill -f "claude.*dangerously-skip-permissions"
    • Windows: Use Task Manager to end node.exe processes running Claude
  • Check for a stale lockfile at ~/.bespokeai/pool.lock and remove it.
  • Disable and re-enable the extension.

🔒 Privacy

Bespoke AI sends the text surrounding your cursor (prefix and suffix context) to the configured backend for completion. Commit message generation sends your staged diff, and suggest edits sends the visible editor content. No other files or data are transmitted. When using the Claude Code CLI backend, requests go through your local Claude Code installation. When using the API backend, requests go directly to your chosen provider's API endpoint. All API keys are stored in your OS keychain via VS Code SecretStorage.

🛠️ Development

npm install && npm run compile    # Build
npm run watch                     # Watch mode (F5 to launch dev host)
npm run check                     # Lint + type-check
npm run test:unit                 # Unit tests
npm run test:quality              # LLM-as-judge quality tests

Tested Models

These are the models the quality test suite runs against. Contributors should test prompt changes against at least one model per extraction strategy (e.g., CLI sonnet, GPT-4.1 Nano, xAI Grok).

# Backend Preset ID Model
1 CLI (default) sonnet
2 CLI — haiku
3 API xai-grok grok-4-1-fast-non-reasoning
4 API xai-grok-code grok-code-fast-1
5 API anthropic-haiku claude-haiku-4-5-20251001
6 API anthropic-sonnet claude-sonnet-4-5-20250929
7 API openai-gpt-4.1-nano gpt-4.1-nano
8 API google-gemini-flash gemini-2.5-flash
9 API ollama-default qwen2.5-coder:7b

Coverage: all 3 extraction strategies, 6 providers, 3 cost tiers, code-specialized model, local option. See CLAUDE.md for testing commands.

🤝 Contributing

Contributions welcome — fork the repo, create a branch, and open a pull request.

  • Run npm run check before submitting (must pass)
  • See CLAUDE.md for architecture details, testing workflows, and coding conventions
  • Open an issue for bugs or feature ideas

👤 Author

Built by Trent McNitt — AI developer specializing in agent development, prompt engineering, and full-stack applications.

Available for contract work on Upwork →

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft