Type AheadInline code autocomplete for VS Code, powered by your choice of LLM backend. Get ghost-text suggestions as you type — accept with Tab, dismiss with Escape. Works with Ollama, Anthropic (Claude), vLLM, LM Studio, LiteLLM Gateway, or any server that speaks the OpenAI chat completions protocol. Quick StartOption 1: Local models with Ollama (free, private, no API key)Best for: privacy-conscious users, offline work, or trying out the extension for free.
Option 2: Anthropic (Claude) via APIBest for: highest quality completions using Claude models, when you have an Anthropic API key.
Option 3: vLLM or LM StudioBest for: running larger models on powerful hardware, or using models not available in Ollama. Both vLLM and LM Studio expose an OpenAI-compatible API. vLLM:
LM Studio:
Option 4: LiteLLM GatewayBest for: organizations that route requests through a centralized LLM proxy, or when you need to use models from multiple providers through one endpoint. LiteLLM is a proxy server that translates OpenAI-format requests to 100+ LLM providers.
Option 5: Any OpenAI-compatible serverThe
Which Backend Should I Use?
All SettingsOpen VS Code Settings (
You can also set these in your
Excluding Files and FoldersUse
Supported patterns:
Custom InstructionsAdd your own instructions that the LLM should follow when generating completions:
In the VS Code settings UI, the These instructions are appended to the system prompt sent to the model with every completion request. Use them to enforce coding standards, style preferences, or project-specific conventions. Dynamic API Keys with
|
| Command | Description |
|---|---|
Type Ahead: Toggle On/Off |
Quickly enable or disable the extension |
Status Bar
The extension shows its status in the bottom-right of VS Code:
| Icon | Meaning |
|---|---|
$(sparkle) Type Ahead |
Ready — waiting for you to type |
$(loading~spin) Type Ahead |
Generating a completion |
$(warning) Type Ahead |
Error — click to toggle, check Output panel for details |
$(circle-slash) Type Ahead |
Disabled |
Click the status bar item to toggle the extension on/off.
Troubleshooting
No completions appear
- Check the status bar — is it showing "Type Ahead" or is it hidden?
- Open the Output panel (
Cmd+Shift+U) and select Extension Host from the dropdown - Look for log lines starting with
Type Ahead:— they show the full request/response flow:Type Ahead: [auth] warming up API key at session start... Type Ahead: [auth] API key ready Type Ahead: [llm] POST http://localhost:11434/v1/chat/completions (model: codellama:7b) Type Ahead: [llm] auth: Bearer token set Type Ahead: [llm] response 200 in 342ms Type Ahead: [llm] completion: 28 chars
Completions are slow
- Increase debounce: Set
debounceMsto 500-1000ms for slow servers. This reduces unnecessary requests while you're still typing. - Use a faster model: Smaller models respond faster. Try
starcoder2:3borcodellama:7binstead of 13B+ models. - Reduce context: Lower
contextLinesfrom 100 to 30-50. Less context = faster inference. - The first completion is always slower because there's no cache. Subsequent completions at the same position are instant (cache hit).
"API error 401" or "API error 403"
- Your API key is invalid or expired
- If using
apiKeyHelper, check that the command works: run it in your terminal and verify it outputs a key - For Anthropic, make sure the key starts with
sk-ant-
"API error 404" or "model not found"
- The model name doesn't match what the server knows. Check:
- Ollama:
ollama listto see installed models - vLLM: check the model name you used in
vllm serve - Anthropic: use
claude-haiku-4-5,claude-sonnet-4-6, etc.
- Ollama:
"apiBaseUrl is required"
- You selected
OpenAI CompatibleorLiteLLM Gatewaybut didn't set a URL - Set
apiBaseUrlto your server's URL (e.g.,http://localhost:11434/v1for Ollama)
Extension Host shows "request failed: fetch failed"
- The server is not running or not reachable at the configured URL
- Check that your server is running:
curl http://localhost:11434/v1/models
Performance Tips
| Tip | Setting | Effect |
|---|---|---|
| Faster suggestions | debounceMs: 150 |
Triggers sooner after you stop typing (more API calls) |
| Less API usage | debounceMs: 500 |
Waits longer, fewer requests, saves tokens |
| Faster inference | contextLines: 30 |
Sends less code to the model |
| Better completions | contextLines: 200 |
More context = more accurate completions (slower) |
| Disable caching | cacheSize: 0 |
Every request goes to the server (useful for testing) |
Language Support
The extension works with all programming languages supported by VS Code. The model receives the file name and language identifier along with the surrounding code, so it can adapt its completions to the language you're working in.
Privacy
- Local models (Ollama, vLLM, LM Studio): Your code never leaves your machine.
- Anthropic / LiteLLM / remote servers: Code context (up to
contextLineslines around your cursor) is sent to the configured API endpoint. No data is stored by the extension itself. - API keys: Stored in VS Code settings (on disk). For sensitive environments, use
apiKeyHelperto generate keys dynamically — they are only held in memory. - No telemetry: The extension does not collect or send any usage data.