AI Autocomplete for VS Code

AI-powered inline code completions for JavaScript, TypeScript, and Dart (Bring your own API key).

Features

Multiple AI Providers: Claude (Anthropic), Gemini (Google), or Ollama (Local Models)
Inline Completions: Get AI-powered code suggestions as you type
Language Support: JavaScript, TypeScript, JSX, TSX, and Dart
Local AI Support: Run models offline with Ollama - no API costs!
Smart Caching: Reduces API calls with LRU cache
Throttling: Prevents API spam with configurable debounce
Error Handling: Graceful degradation with informative messages
Language-Aware: Context-aware completions for each language

Supported Languages

JavaScript (.js, .jsx, .mjs)
TypeScript (.ts, .tsx)
Dart (.dart) - Great for Flutter development!

Setup

Choose your preferred AI provider:

Option 1: Local Models with Ollama (Recommended for Privacy & Cost)

Why Ollama?

✅ 100% Free - No API costs
✅ Privacy - Your code never leaves your machine
✅ Offline - Works without internet
✅ Fast - 2-5 second response times on modern hardware

Setup Steps:

Install Ollama
- Download from ollama.ai
- Windows: Run the installer
- Mac: brew install ollama
- Linux: curl -fsSL https://ollama.ai/install.sh | sh

Pull a Model

# Lightweight model (recommended for autocomplete)
ollama pull gemma3:1b

# OR a more powerful model
ollama pull deepseek-r1:8b

Verify Ollama is Running
```
ollama list
```
You should see your downloaded model(s)
Configure Extension
- Open VS Code Command Palette (Ctrl+Shift+P)
- Run AI Autocomplete: Select Model
- Choose Ollama (Local)
- Run AI Autocomplete: Set Ollama Model
- Enter your model name (e.g., gemma3:1b)

Recommended Models:

gemma3:1b - Fast, lightweight (1GB RAM)
qwen2.5-coder:1.5b - Optimized for code (2GB RAM)
deepseek-r1:8b - More accurate, slower (8GB RAM)

Option 2: Claude API (Anthropic)

Go to Anthropic Console
Sign up or log in
Navigate to "API Keys" section
Create a new API key
Run AI Autocomplete: Set API Key in VS Code
Run AI Autocomplete: Select Model and choose a Claude model

Option 3: Gemini API (Google)

Go to Google AI Studio
Create an API key
Run AI Autocomplete: Set API Key in VS Code
Run AI Autocomplete: Select Model and choose a Gemini model

Usage

Automatic Completions

Completions appear automatically as you type in supported languages. The extension:

Extracts context around your cursor
Sends it to API
Displays suggestions inline
Caches results to minimize API calls

Manual Trigger

Press Ctrl+Shift+Space (or Cmd+Shift+Space on Mac) to manually request a completion.

Commands

Available commands in the Command Palette:

AI Autocomplete: Set API Key - Update your Claude/Gemini API key
AI Autocomplete: Select Model - Choose AI provider (Claude, Gemini, or Ollama)
AI Autocomplete: Set Ollama Base URL - Configure Ollama server URL (default: http://localhost:11434)
AI Autocomplete: Set Ollama Model - Choose which local model to use
AI Autocomplete: Toggle Autocomplete - Enable/disable the extension
AI Autocomplete: Set Temperature - Adjust creativity (0=deterministic, 1=creative)
AI Autocomplete: Set Debounce Delay - Change wait time before requesting (100-2000ms)
AI Autocomplete: Clear Completion Cache - Clear cached completions
AI Autocomplete: Trigger Completion Manually - Request a completion manually

Configuration

Open Settings and search for "AI Autocomplete" to customize:

Setting	Default	Description
`aiAutocomplete.enabled`	`true`	Enable/disable completions
`aiAutocomplete.model`	`claude-3-5-haiku-20241022`	Model to use (or "ollama" for local)
`aiAutocomplete.ollamaBaseUrl`	`http://localhost:11434`	Ollama server URL
`aiAutocomplete.ollamaModel`	`gemma3:1b`	Local Ollama model name
`aiAutocomplete.temperature`	`0.2`	Determinism (0=exact, 1=creative)
`aiAutocomplete.debounceDelay`	`300`	Wait time (ms) before requesting (100-2000)

Cost Estimation

Based on Claude Sonnet 4 pricing:

Input: $3 / million tokens
Output: $15 / million tokens

Typical usage: ~$0.10-0.50 per hour of active coding

Example:

50 completions/hour
500 input tokens average = 25,000 total input tokens
100 output tokens average = 5,000 total output tokens
Cost: (25,000 × $3/M) + (5,000 × $15/M) ≈ $0.15

Performance Notes

Request Throttling: Max 1 request per 300ms to prevent API spam
Smart Caching: 100-entry LRU cache with 5-minute TTL
Aggressive Filtering: Only completes in code (not comments/strings)
Timeouts: 5-second timeout per request to avoid blocking
Memory: Uses <50MB memory typical usage

Limitations

Only works in supported languages (JS/TS/Dart)
No completions in comments or string literals
Requires network connection
API key is stored in VS Code settings

Troubleshooting

"API key not configured"

Run AIAutocomplete: Set API Key command
Or add your key to VS Code settings

"Rate limited by API"

Reduce debounceDelay setting to decrease request frequency
Wait a moment before continuing typing

"Request timeout"

For Ollama:

Ensure Ollama is running: ollama list
Check if model is loaded: ollama run <model-name> "test"
Verify base URL in settings (should be http://localhost:11434 or http://127.0.0.1:11434)

For Claude/Gemini:

The API took too long
Check your internet connection
Try again

No completions appearing

Ensure language is supported (JS/TS/Dart)
Check you're not in a comment or string
Verify API key is valid
Check VS Code output channel: AI Autocomplete

Privacy & Security

Your API key is stored securely in VS Code's encrypted settings
Code context is sent to Anthropic API for processing
Results are cached locally (not stored externally)
No telemetry or usage tracking

License

MIT

Support

For issues or questions:

Check the troubleshooting section
View logs in VS Code output channel: AI Autocomplete
File an issue here

Made with ❤️ for developers who love AI-assisted coding

AI Code Autocomplete

siffat-ahmed