Voice Agent
Voice input for AI coding assistants. Speak your prompts instead of typing.
Works with Claude Code, Cursor, GitHub Copilot, and any other VS Code AI tool. Press Ctrl+Alt+V, speak naturally, and your words appear in the chat input automatically. No clicking, no manual stop. Just speak and pause.
Features
- OpenAI Whisper API: fastest and most accurate option — just add your API key and go
- Fully offline fallback: local Whisper model, speech never leaves your machine
- Accent-aware: handles Indian, Chinese, Japanese, and other non-native English accents well
- Smart auto-stop: Detects when you stop speaking automatically
- Works everywhere: Claude Code, Cursor, Copilot Chat, any terminal or editor
- No subscriptions: pay only for what you use (~$0.006/min with OpenAI, roughly $0.10 for a 10-minute session), or use fully offline for free
Setup
Step 1: Install Python dependencies
pip install faster-whisper sounddevice numpy webrtcvad-wheels openai
Step 2: Add your OpenAI API key (recommended for best results)
Get a free API key at platform.openai.com/api-keys.
Open VS Code Settings (Ctrl+,), search Voice Agent, and set:
voice-agent.provider to openai
voice-agent.openaiApiKey to your key (sk-...)
OpenAI Whisper gives significantly better accuracy and faster results than local models. At ~$0.006/min, a full day of dictation costs less than $0.10. Strongly recommended over local mode.
Usage
- Press
Ctrl+Alt+V (Mac: Cmd+Alt+V) or click the 🎤 Voice button in the status bar
- Speak your prompt naturally
- Pause for ~1 second — transcription starts automatically
- Your words are pasted directly into the active input
Press the button again at any time to cancel.
Settings
| Setting |
Default |
Description |
voice-agent.provider |
local |
openai for cloud (recommended), local for offline |
voice-agent.openaiApiKey |
(empty) |
OpenAI API key for cloud transcription |
voice-agent.model |
base |
Local model size: tiny, base, small, medium |
voice-agent.language |
en |
Spoken language code (e.g. ja, zh, hi, fr) |
voice-agent.pythonPath |
(empty) |
Custom Python path if needed |
Provider Comparison
| Provider |
Quality |
Speed |
Cost |
Privacy |
| OpenAI API |
Excellent |
~1s |
~$0.006/min |
Cloud |
Local (small) |
Better |
~4s |
Free |
100% offline |
Local (base) |
Good |
~2s |
Free |
100% offline |
Local (tiny) |
Basic |
~1s |
Free |
100% offline |
Status Bar
| State |
Meaning |
🎤 Voice (highlighted) |
Ready, using OpenAI API |
🎤 Voice (plain) |
Ready, using local Whisper |
⟳ Loading model... |
First-time model load (local only) |
⏺ Listening... |
Recording — speak now |
⟳ Transcribing... |
Processing your speech |
Troubleshooting
- Nothing happens: make sure Python dependencies are installed (
pip install faster-whisper sounddevice numpy webrtcvad-wheels)
- OpenAI error: check your API key is correct and has credits
- Slow first use: local Whisper downloads a model on first run (~150MB for
base), cached after that
- No speech detected: speak closer to the mic; check microphone permissions in Windows Settings → Privacy → Microphone
License
MIT