Voice Typing — VS Code / Cursor Extension

Speak into your microphone, get a clean coding prompt inserted into your editor. Everything runs locally — no cloud services required.

Flow: Mic → whisper.cpp (local STT) → Ollama (local LLM rewrite) → Insert into editor

Platforms: macOS · Linux · Windows — works in both VS Code and Cursor

Install

From Marketplace: Voice Typing on VS Code Marketplace

Or from VSIX: Download the latest .vsix from GitHub Releases, then in VS Code/Cursor: Cmd+Shift+P → "Install from VSIX"

Quick Start (macOS)

brew install whisper-cpp sox       # STT engine + audio recorder
brew install ollama                # optional: LLM rewrite
ollama pull llama3.2:3b            # optional: pull rewrite model

Install the extension from the VS Code Marketplace, or build from source:

git clone https://github.com/bread22/voice-typing.git
cd voice-typing && npm install && npm run package
# Then: Cmd+Shift+P → "Install from VSIX" → select voice-prompt-*.vsix

Press Option+V to start recording. The whisper model (~75 MB) downloads automatically on first use.

Cross-Platform Installation

Step 1: Install whisper-cpp (speech-to-text engine)

macOS

brew install whisper-cpp

Installs the whisper-cli binary to your PATH. Verify with:

whisper-cli --help

Linux (Ubuntu / Debian)

Option A — Build from source (recommended):

sudo apt install build-essential libsdl2-dev
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build
cmake --build build --config Release
sudo cp build/bin/whisper-cli /usr/local/bin/

Option B — Download pre-built binary:

Download from whisper.cpp releases, extract, and place whisper-cli somewhere in your PATH.

Verify with:

whisper-cli --help

Windows

Option A — Build with CMake + Visual Studio:

git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
cmake -B build
cmake --build build --config Release

The binary will be at build\bin\Release\whisper-cli.exe. Add it to your PATH or set voicePrompt.stt.whisperCppPath in settings.

Option B — Download pre-built binary:

Download from whisper.cpp releases. Place whisper-cli.exe in a directory on your PATH.

Verify with:

whisper-cli.exe --help

Step 2: Install an audio recorder

The extension auto-detects the best available recorder for your platform:

Platform	Recommended	Install
macOS	SoX (auto-detected first)	`brew install sox`
Linux	arecord (usually built-in)	`sudo apt install alsa-utils` (if not present)
Windows	FFmpeg	`winget install ffmpeg` or ffmpeg.org

Detection priority:

Platform	Tries in order
macOS	SoX → FFmpeg
Linux	arecord → SoX → FFmpeg
Windows	FFmpeg → SoX

If no recorder is found, the extension shows the appropriate install command.

Step 3: Install Ollama (optional — LLM rewrite)

Ollama rewrites your raw voice transcript into a clean, structured prompt. Without it, the raw transcript is inserted directly (still useful!).

macOS

brew install ollama
ollama pull llama3.2:3b

Ollama starts automatically via Homebrew services. Verify: curl http://127.0.0.1:11434

Linux

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.2:3b

Start the server: ollama serve (or configure as a systemd service).

Windows

Download from ollama.com and install. Then:

ollama pull llama3.2:3b

Step 4: Install the extension

git clone https://github.com/bread22/voice-typing.git
cd voice-typing
npm install
npm run package

This creates a voice-prompt-*.vsix file. Install it:

VS Code: Ctrl+Shift+P → "Extensions: Install from VSIX..." → select the file
Cursor: Cmd+Shift+P (Mac) / Ctrl+Shift+P (Win/Linux) → "Extensions: Install from VSIX..." → select the file

Reload the editor when prompted.

Step 5: First run

Press Alt+V (or Option+V on Mac)
On first use, the whisper model (ggml-tiny.en.bin, ~75 MB) downloads automatically
Grant microphone access if prompted by your OS
Speak naturally — the extension auto-stops when you pause

Usage

Start recording

Method	Shortcut / Action
Keyboard	`Alt+V` (Windows/Linux) · `Option+V` (Mac)
Command palette	`Ctrl+Shift+P` → Voice Prompt: Start Recording
Status bar	Click the $(mic) Voice Prompt button (bottom bar)

Recording flow

Press shortcut → status bar shows "Recording..."
Speak naturally — e.g. "add a function that validates email addresses using regex"
Pause speaking → silence detection (VAD) auto-stops recording
Status bar progress: Listening → Transcribing → Rewriting → Done
Result inserted at your cursor position (cursor moves to end, space appended)

Example

You say:

"um so I need a function that uh validates email addresses and it should use regex and return true or false"

Gets inserted as:

Add a function that validates email addresses using regex. It should return true for valid emails and false otherwise.

Without Ollama (no rewrite)

If Ollama isn't running, the raw transcript is inserted directly. You can also disable rewrite explicitly by setting voicePrompt.rewrite.provider to none.

Settings Reference

Open settings (Ctrl+,) and search for voicePrompt.

Speech-to-Text

Setting	Default	Description
`voicePrompt.stt.provider`	`whisper-cpp`	`whisper-cpp` (local CLI) or `http` (custom server)
`voicePrompt.stt.model`	`tiny.en`	Model size: `tiny.en`, `base.en`, `small.en`, etc.
`voicePrompt.stt.whisperCppPath`	(auto-detect)	Custom path to `whisper-cli` binary
`voicePrompt.stt.modelPath`	(auto-download)	Custom path to GGML model file
`voicePrompt.stt.httpEndpoint`	`http://127.0.0.1:8765/transcribe`	HTTP endpoint (when provider is `http`)
`voicePrompt.stt.timeoutMs`	`30000`	STT timeout in milliseconds
`voicePrompt.stt.language`	`en`	Language code for speech recognition

Model sizes (auto-downloaded):

Model	Size	Speed	Accuracy
`tiny.en`	~75 MB	Fastest	Good for commands and short prompts
`base.en`	~142 MB	Fast	Better accuracy
`small.en`	~466 MB	Moderate	Best accuracy for English

Rewrite (LLM)

Setting	Default	Description
`voicePrompt.rewrite.provider`	`ollama`	`ollama`, `cloud`, or `none`
`voicePrompt.rewrite.model`	`llama3.2:3b`	LLM model for rewriting
`voicePrompt.rewrite.ollamaBaseUrl`	`http://127.0.0.1:11434`	Ollama server URL
`voicePrompt.rewrite.cloudBaseUrl`	`https://api.openai.com/v1/chat/completions`	Cloud API endpoint (OpenAI-compatible)
`voicePrompt.rewrite.timeoutMs`	`20000`	Rewrite timeout in milliseconds
`voicePrompt.rewrite.style`	`engineering`	`concise`, `detailed`, `engineering`, `debugging`

Behavior

Setting	Default	Description
`voicePrompt.previewBeforeInsert`	`false`	Show editable preview before inserting
`voicePrompt.autoFallbackToCloud`	`false`	Auto-fallback to cloud if local rewrite fails
`voicePrompt.noRewriteBehavior`	`stt_passthrough`	`stt_passthrough` = insert raw text · `disable_plugin` = block insertion
`voicePrompt.showStatusBarButton`	`true`	Show mic button in status bar
`voicePrompt.insertTrailingSpace`	`true`	Append space after each insertion for easier consecutive inputs

Voice Activity Detection (VAD)

Setting	Default	Range	Description
`voicePrompt.vad.enabled`	`true`	—	Auto-stop recording after silence
`voicePrompt.vad.silenceMs`	`1500`	600–3000 ms	Silence window before auto-stop
`voicePrompt.vad.minSpeechMs`	`300`	100–1000 ms	Minimum speech duration to accept

VAD tuning tips:

Getting cut off mid-sentence? Increase silenceMs by 100–200 ms
Feels slow to respond? Decrease silenceMs by 100 ms
Ambient noise triggering? Increase minSpeechMs to 400–500 ms

Cloud Rewrite (Optional)

Use OpenAI or another cloud LLM instead of (or as fallback to) local Ollama:

Set API key: Ctrl+Shift+P → Voice Prompt: Set Cloud API Key (stored securely in VS Code SecretStorage)
Switch provider: Set voicePrompt.rewrite.provider to cloud
Or use as fallback: Keep ollama as provider, enable voicePrompt.autoFallbackToCloud

Troubleshooting

Problem	Solution
"whisper-cpp not found"	Install whisper-cpp (see platform instructions above)
"No audio recorder found"	Install SoX, arecord, or FFmpeg (see Step 2 above)
"Transcription failed"	Check that `whisper-cli` works: `whisper-cli -m <model> -f test.wav`
Model download fails	Set `voicePrompt.stt.modelPath` to a manually downloaded `.bin` file
Ollama not rewriting	Check `ollama ps` — model should be loaded. Try `ollama run llama3.2:3b`
Cursor position jumps on insert	Ensure `voicePrompt.insertTrailingSpace` is `true` (default)
VAD cuts off too early	Increase `voicePrompt.vad.silenceMs` (try 2000–2500)
Extension not responding to hotkey	Reload window: `Ctrl+Shift+P` → "Developer: Reload Window"
Microphone permission denied (macOS)	System Preferences → Privacy & Security → Microphone → allow VS Code/Cursor

Architecture

┌─────────────┐    ┌──────────┐    ┌─────────────┐    ┌───────────┐    ┌─────────┐
│  Microphone  │ →  │ Platform │ →  │ whisper-cpp │ →  │  Ollama   │ →  │ Editor  │
│  (Alt+V)     │    │ Recorder │    │ (local STT) │    │ (rewrite) │    │ Insert  │
└─────────────┘    │ WAV file │    │ Raw text    │    │ Clean text│    │ @ cursor│
                   └──────────┘    └─────────────┘    └───────────┘    └─────────┘

All processing happens locally on your machine. No audio or text is sent to any cloud service unless you explicitly configure a cloud rewrite provider.

Project Structure

src/
  extension.ts           — command registration and activation
  audio/
    audioCaptureService  — cross-platform mic capture (SoX / arecord / FFmpeg) + VAD
  stt/
    whisperCppSttProvider — shells out to whisper-cli binary
    httpSttProvider        — HTTP STT for custom servers (opt-in)
    modelManager           — auto-downloads GGML models from HuggingFace
  rewrite/
    ollamaRewriteProvider  — local Ollama LLM rewrite
    cloudRewriteProvider   — cloud LLM rewrite (OpenAI-compatible)
  inject/
    cursorInputInjector    — inserts text at cursor, manages position
  config/
    settings               — reads VS Code configuration
    secrets                — manages API keys via SecretStorage
  types/
    contracts              — shared TypeScript interfaces
  orchestration/
    voicePromptOrchestrator — pipeline: record → transcribe → rewrite → inject

Development

npm install          # install dependencies
npm run build        # compile TypeScript
npm run bundle:dev   # esbuild development bundle (with source maps)
npm run bundle       # esbuild production bundle (minified)
npm run package      # create .vsix for distribution
npm run dev          # build + package dev vsix
npm run watch        # watch mode for development

Press F5 in VS Code/Cursor to launch the Extension Development Host with the debugger attached.

License

MIT

Voice Typing

Qingyi Wu