Voice Agent

Voice input for AI coding assistants. Speak your prompts instead of typing.

Works with Claude Code, Cursor, GitHub Copilot, and any other VS Code AI tool. Press Ctrl+Alt+V, speak naturally, and your words appear in the chat input automatically. No clicking, no manual stop. Just speak and pause.

Features

OpenAI Whisper API: fastest and most accurate option — just add your API key and go
Fully offline fallback: local Whisper model, speech never leaves your machine
Accent-aware: handles Indian, Chinese, Japanese, and other non-native English accents well
Smart auto-stop: Detects when you stop speaking automatically
Works everywhere: Claude Code, Cursor, Copilot Chat, any terminal or editor
No subscriptions: pay only for what you use (~$0.006/min with OpenAI, roughly $0.10 for a 10-minute session), or use fully offline for free

Setup

Step 1: Install Python dependencies

pip install faster-whisper sounddevice numpy webrtcvad-wheels openai

Step 2: Add your OpenAI API key (recommended for best results)

Get a free API key at platform.openai.com/api-keys.

Open VS Code Settings (Ctrl+,), search Voice Agent, and set:

voice-agent.provider to openai
voice-agent.openaiApiKey to your key (sk-...)

OpenAI Whisper gives significantly better accuracy and faster results than local models. At ~$0.006/min, a full day of dictation costs less than $0.10. Strongly recommended over local mode.

Usage

Press Ctrl+Alt+V (Mac: Cmd+Alt+V) or click the 🎤 Voice button in the status bar
Speak your prompt naturally
Pause for ~1 second — transcription starts automatically
Your words are pasted directly into the active input

Press the button again at any time to cancel.

Settings

Setting	Default	Description
`voice-agent.provider`	`local`	`openai` for cloud (recommended), `local` for offline
`voice-agent.openaiApiKey`	(empty)	OpenAI API key for cloud transcription
`voice-agent.model`	`base`	Local model size: `tiny`, `base`, `small`, `medium`
`voice-agent.language`	`en`	Spoken language code (e.g. `ja`, `zh`, `hi`, `fr`)
`voice-agent.pythonPath`	(empty)	Custom Python path if needed

Provider Comparison

Provider	Quality	Speed	Cost	Privacy
OpenAI API	Excellent	~1s	~$0.006/min	Cloud
Local (`small`)	Better	~4s	Free	100% offline
Local (`base`)	Good	~2s	Free	100% offline
Local (`tiny`)	Basic	~1s	Free	100% offline

Status Bar

State	Meaning
`🎤 Voice` (highlighted)	Ready, using OpenAI API
`🎤 Voice` (plain)	Ready, using local Whisper
`⟳ Loading model...`	First-time model load (local only)
`⏺ Listening...`	Recording — speak now
`⟳ Transcribing...`	Processing your speech

Troubleshooting

Nothing happens: make sure Python dependencies are installed (pip install faster-whisper sounddevice numpy webrtcvad-wheels)
OpenAI error: check your API key is correct and has credits
Slow first use: local Whisper downloads a model on first run (~150MB for base), cached after that
No speech detected: speak closer to the mic; check microphone permissions in Windows Settings → Privacy → Microphone

License

MIT

Voice Agent

Aditya Kangune

Voice Agent

Features

Setup

Step 1: Install Python dependencies

Step 2: Add your OpenAI API key (recommended for best results)

Usage

Settings

Provider Comparison

Status Bar

Troubleshooting

License