L33t Speak - Voice to Text for VS Code

A VS Code extension that provides voice-to-text dictation using OS-native speech recognition. Speak into your microphone and have your words inserted at your cursor — in the editor or the terminal.

Demo

Built for developers who want hands-free text input while coding, writing commit messages, or interacting with CLI tools like Claude Code.

Features

Voice dictation — press a hotkey to start/stop recording, text appears where your cursor is
Works in editors and terminals — dictation targets whichever panel is focused when you start
Claude Code integration — speak commands directly into a Claude Code terminal session
Microphone selection — pick your input device from the status bar
Audio feedback — short tones on start, stop, and cancel so you know the state
Cancel with ESC — discard what you said without inserting anything

Keybindings

Key	Action
`Cmd+Shift+D` (macOS) / `Ctrl+Shift+D`	Toggle dictation on/off
`Cmd+Shift+Space` / `Ctrl+Shift+Space`	Start/stop dictation (alternative)
`Escape`	Cancel dictation (discards buffered text)

How It Works

The extension uses macOS native speech recognition (SFSpeechRecognizer) on Mac and System.Speech on Windows. A hidden WebView fallback using the Web Speech API is available when native engines aren't present.

On macOS, a compiled Swift helper binary handles the actual speech recognition and communicates with the extension via JSON over stdin/stdout. This means:

No API keys required
No internet required (on macOS)
Low latency, high accuracy via Apple's on-device models

Requirements

macOS 10.15+ or Windows 10/11
Microphone access must be granted to VS Code in System Settings > Privacy & Security
Speech Recognition access must be granted (macOS will prompt on first use)

Extension Settings

Setting	Default	Description
`l33t-speak.preferredEngine`	`auto`	Engine to use: `auto`, `macos`, `windows`, or `webview`
`l33t-speak.microphone`	`null`	Device ID of preferred microphone
`l33t-speak.language`	`en-US`	BCP-47 language code for recognition
`l33t-speak.insertMode`	`final`	Insert on final result only, or stream interim results

Development

# Install dependencies
npm install

# Build the extension
npm run build

# Build the macOS Swift helper
./helpers/macos/build.sh

# Run tests (requires Node 20)
npm test

# Launch in VS Code
# Press F5 to open an Extension Development Host

Architecture

Extension Host (TypeScript)
    ├── extension.ts        ← commands, wiring
    ├── recognizer.ts       ← engine selection, lifecycle
    ├── engines/
    │   ├── macos.ts        ← spawns Swift helper
    │   ├── windows.ts      ← spawns PowerShell helper
    │   └── webview.ts      ← Web Speech API fallback
    ├── statusBar.ts        ← UI state
    ├── insertion.ts        ← routes text to editor/terminal
    └── tones.ts            ← audio feedback

Native Helpers (communicate via JSON lines over stdin/stdout)
    ├── bin/darwin/dictation-helper    ← compiled Swift binary
    └── bin/win32/dictation-helper.ps1 ← PowerShell script

Known Limitations

Linux is not supported (no native speech engine; WebView fallback requires internet)
Push-to-talk is implemented as press-to-start/press-to-stop (VS Code doesn't support keyup events)
The WebView fallback requires an internet connection (Chrome's speech recognition uses Google servers)
First use on macOS requires granting both Microphone and Speech Recognition permissions to VS Code

L33t Speak - Voice to Text

Christopher Connolly