STP — Hands-Free Voice Assistant for Claude Code and Antigravity

Wake it with a phrase. Dictate. Send by voice. STP works the way "hey Siri" works — except instead of opening apps or setting timers, it talks to your AI coding assistant for you. Works with Antigravity or Claude Code. You pick the target, you pick the chat.
student working on a side project. Once the voice pipeline was reliable enough to use, I started building the rest of the extension with the extension.
Internally the package is named VTP (Voice to Prompt). It's the same project — STP is the user-facing name, VTP is the codename / repo / package id.
Watch the Demo
Click to open on YouTube
How It Works
The full loop — start, dictate, send, repeat — happens by voice. No mouse. No keyboard. No focus required.
Say your wake phrase and STP starts listening. The default is "hey antigravity" — change it in Settings. Then dictate naturally and end with something like "send the prompt." STP routes it into either Antigravity or Claude Code, whichever you've selected as the target.
Two listening modes
| Mode |
What happens after a prompt is sent |
| Voice Activated (default) |
Mic idles. Say your wake phrase to start the next prompt. |
| Continuous |
Mic auto-restarts immediately. Just keep talking. |
Switch in the Settings menu inside the panel.
Pick your target — Antigravity or Claude Code
A button in the panel header (→ AG / → CC) toggles which AI chat receives your prompts. The voice flow doesn't change — only where the prompt lands.
When the target is Claude Code, the context card below the button becomes a chat lock. You can have five Claude Code conversations open and only the one you locked will receive the prompt — the others get nothing. Switch back to Antigravity and the lock is preserved (so the next time you flip to Claude, it remembers).
Send while you're tabbed out
Injection doesn't need focus. Start a prompt by voice, alt-tab to a browser to look at docs or scroll through something, say "send the prompt" — it lands. The chat is in your IDE, not your active window.
What You Can Say
Basic flow — dictate and send
"Hey antigravity. Build a staff availability page. Each staff member sets their weekly schedule with a calendar UI. Use the existing dashboard card styling. Send the prompt."
STP hears send the prompt, runs an auto-clean pass to strip filler words, and injects the cleaned prompt into the target you picked.
Enhance flow — let Gemini rewrite it
"Hey antigravity. I need a modal that blocks staff from setting availability until they've saved their service category. Enhance this prompt."
Gemini rewrites it into a more specific prompt using workspace context (open files, recent conversation history). Review it inline:
"Approve." → fires into the target.
"Reject." → restores your original.
"Try again." → Gemini takes another pass.
Clean flow — review the cleanup before sending
"Hey antigravity. Add rate limiting to the API — hold on, sorry — add a 429 response with a Retry-After header. Clean up and review."
Clean up and review runs the same cleanup pass that runs silently before every send, but shows you the result first so you can approve, reject, or regenerate it. Clean it up does the same cleanup but applies it without a preview.
Pause and resume — mid-session
"Hey antigravity. Build a login page with Google OAuth and — pause."
Mic mutes. Buffer preserved. Later:
"Resume." → mic wakes exactly where you left off.
Why This Saves Tokens
Every prompt runs through an automatic cleanup before it's injected. The cleanup strips filler words ("uh", "um", "like"), collapses immediate word repeats ("the the navbar" → "the navbar"), and tidies whitespace.
For long dictated prompts that's anywhere from 5–30% fewer input tokens — every send. The cleanup is regex-only by default, so most sends cost zero extra API calls. It only escalates to a small Gemini pass when the buffer has obvious self-corrections (you said "actually" or "I mean" mid-sentence) or heavy filler density.
Auto-cleanup is skipped when the prompt has already been enhanced or cleaned via clean it up / clean up and review, so you never pay the cleanup cost twice.
What's Under the Hood
- Audio pipeline — local FFmpeg capture, denoise, and chunking. Audio never goes to a third party for transcription unless you opt into Deepgram.
- Real-time transcription mode — Deepgram drops voice-command latency from ~5s (Gemini chunked) to ~300ms. Free tier, optional, opt-in.
- Auto-clean before send — regex strips fillers locally; Gemini cleanup only runs when the buffer is genuinely noisy.
- Prompt enhancement — Gemini rewrites your dictation as a spec-style prompt using your open files and recent conversation as context.
- Approve / Reject / Try Again — review enhancements with buttons or by voice.
- Pause / resume —
pause mutes the mic, resume brings it back, buffer kept intact across the gap.
- Target switching — one button toggles between Antigravity and Claude Code. Settings, lock state, and wake phrase persist across switches.
- Per-conversation chat lock (Claude Code) — pick a single Claude Code chat tab as the destination so prompts don't fan out to every open conversation.
- Focus-independent injection — works while you're tabbed into another window or scrolling somewhere else.
- Workspace context — reads open files, active conversation, and workspace name for smarter enhancements.
- Multi-conversation context — layer past Antigravity conversations as supplementary read-only context.
- Secure key storage — Gemini and Deepgram keys live in VS Code SecretStorage (the OS keychain). Never in a file, never in source.
- Built-in noise suppression — 3-stage FFmpeg filter chain strips low-frequency rumble before audio reaches Gemini.
Voice Command Reference
Starting a session
| Say |
What happens |
hey antigravity (default) |
Starts listening — no click needed |
| Any custom wake phrase |
Configurable in Settings |
While dictating
Most commands work mid-sentence — say them naturally as part of your dictation.
| Say |
What happens |
| (just talk) |
Appends to your prompt buffer |
send it / send the prompt |
Auto-cleans, then injects into the current target |
enhance this prompt |
Gemini rewrites it. Review inline by voice |
clean it up / scrub that |
Cleans the buffer in place. Silent — no preview |
clean up and review |
Cleans + shows preview. Approve / Reject / Try Again inline |
approve / reject / try again |
Controls the enhancement or clean-review preview |
Auto-cleanup runs silently before every send unless the buffer is already enhanced or cleaned. You don't have to ask for it.
Pause and resume
| Say |
What happens |
pause / mute / stop listening |
Mutes mic immediately; buffer preserved |
resume / I'm back |
Wakes back up exactly where you left off |
pause and pull up [url] |
Pauses and opens the URL in one command |
Side commands
These trigger actions without touching your prompt buffer.
| Say |
What happens |
pull up [url or topic] |
Opens URL or fires into Antigravity chat |
search for [query] / look up [query] |
Searches via Antigravity MCP |
navigate to [url] / browse to [url] |
Opens the URL directly |
Clear
| Say |
What happens |
clear transcript / clear that |
Discards the current buffer — must be said alone |
start over |
Resets the buffer |
Commands fire ~10x faster in Deepgram mode (~300ms) vs Gemini chunked mode (~5s).
Getting Started
1. Install FFmpeg
Windows (recommended via winget):
winget install ffmpeg
Or download from ffmpeg.org and add to your system PATH. Restart your IDE after installing.
2. Get a Gemini API Key
Get one at aistudio.google.com — no credit card required. The free tier (~15 req/min) is enough for normal use.
3. Add your key to STP
Open the STP panel from the Activity Bar, click KEY, and paste your Gemini API key. Stored in VS Code SecretStorage.
4. Pick your target
Click the → AG button in the panel header to flip to → CC (Claude Code) or back. Default is Antigravity.
5. Start talking
Click the microphone button and dictate. In Gemini mode, audio processes in 3-second chunks and the transcript updates live.
6. (Recommended) Enable Deepgram for real-time transcription
Click LIVE in the panel header and follow the onboarding. ~300ms latency for voice commands instead of ~5s. Free tier covers normal use.
Claude Code Support — How It Works
Claude Code's chat panel is a webview. To inject prompts into it without an official extension API, STP hot-patches the locally installed Claude Code extension on first activation. The patch:
- Adds four small commands to Claude Code's IPC (
injectPromptVTP, submitVTP, getPanelTitlesVTP, plus internal helpers).
- Tags each chat panel with its title so STP can route to the right conversation when locked.
- Is fully reversible — run "VTP: Restore Claude Code" from the Command Palette to undo it.
- Re-applies automatically if Claude Code is updated. The previous patch is restored first, then the new patch is applied to the new bundle.
A backup of every patched file is stored next to the patched extension (.vtp-backups/) so you can roll back even if the extension is fully clean.
If you don't want any patching at all, leave the target on Antigravity. The Claude Code path is never touched.
Deepgram — Real-Time Transcription (Optional)
Deepgram is an opt-in third-party service that drops transcription latency to ~300ms via real-time streaming. A free API key is all you need.
Click the LIVE button in the panel and follow the onboarding. Your key is stored in VS Code SecretStorage and only sent to Deepgram during recording.
Deepgram Data Usage
Based on publicly available Deepgram documentation (early 2026). See deepgram.com/privacy for the authoritative source.
| Feature |
Default |
Opt-out |
| Transcription |
Audio sent to Deepgram's API during recording |
— |
| Model training |
May use audio to improve models (50% discount applied) |
Add mip_opt_out=true to API requests |
| Data selling |
No — Deepgram does not sell your voice data |
— |
| Data retention |
Logs retained ~90 days |
Opt-out: deleted after processing |
| Compliance |
HIPAA, GDPR, CCPA, SOC-2 Type 2, TLS 1.3, AES-256 |
— |
Get a free Deepgram API key — Privacy policy
Claude Code mode requires Deepgram. Voice commands need ~300ms response time to feel like an assistant; Gemini chunked mode is too slow for that loop.
Memory Context — Multi-Conversation Selection
STP automatically detects your current Antigravity conversation (the most recently modified chat) and feeds it into every enhancement as the primary context.
If you need to pull in knowledge from other past chats — a previous session on the same project, a different workspace's conversation — click the context card at the top of the panel.
| What |
How |
| Primary context |
Auto-detected (most recently modified conversation). Always active. |
| Extra context |
Past Antigravity conversations you manually check in the picker. |
| Adding extras |
Click the context card, tick checkboxes, press Enter. Multiple selections OK. |
| Removing extras |
Click again, untick, press Enter. |
| Effect |
Extra messages are appended (read-only) to the primary context when Gemini elaborates your prompt. |
| Badge |
A purple +N pill appears on the card when extras are active. |
Extras are read-only — they inform the elaboration but cannot be modified or replace the primary auto-detected chat.
| Platform |
Status |
| Windows |
Fully supported (DirectShow via FFmpeg) |
| macOS |
Planned (AVFoundation) |
| Linux |
Planned (ALSA) |
Settings
| Setting |
Default |
Description |
vtp.injectionTarget |
antigravity |
Which AI chat receives prompts. antigravity or claude-code |
vtp.claudeCodeLockedTitle |
"" |
Title of the Claude Code conversation locked as the destination (empty = fan out to all open Claude chats) |
vtp.vadMode |
false |
Always-on VAD — auto-pauses after silence, restarts automatically |
vtp.contextDepth |
20 |
Recent conversation messages passed as context to Gemini |
vtp.elaborationModel |
gemini-2.5-flash |
Gemini model used for prompt enhancement |
vtp.transcriptionEngine |
gemini |
gemini (default, uses Gemini key) or deepgram (real-time, requires separate Deepgram key) |
Audio Pipeline
Gemini Mode (Default)
Mic → FFmpeg → [highpass=f=80] → [afftdn=nf=-25] → [silencedetect] → WAV chunks
Remove rumble FFT denoiser VAD trigger
↓
Gemini 2.5 Flash
(transcribe)
↓
Live transcript UI
highpass=f=80 — strips sub-80Hz rumble (HVAC, desk vibration, USB hiss)
afftdn=nf=-25 — FFmpeg's FFT denoiser; estimates noise floor in first ~0.4s and subtracts it every frame
silencedetect=noise=-40dB:d=2.5 — triggers VAD after 2.5s of true silence (on the cleaned signal)
- Audio is segmented into 3-second WAV chunks and sent to Gemini for verbatim transcription
- Each chunk's transcript is appended live to the panel as it comes back
Deepgram Mode (Optional)
Mic → FFmpeg → raw s16le PCM @ 16kHz → WebSocket → Deepgram nova-2
↓
Interim results ~300ms
↓
Final words committed
↓
Live transcript UI
No WAV files written. No chunks. Audio streams directly in real-time for near-instant feedback.
Built With

Privacy
STP does not collect, store, or transmit any personal data.
| What |
Where it goes |
| Audio (Gemini mode) |
Captured locally by FFmpeg, sent to Gemini for transcription, then discarded. Never written to disk permanently. |
| Audio (Deepgram mode) |
Streamed to Deepgram's API in real-time during recording. See Deepgram data usage above. |
| Transcripts |
Held in the panel session only. Gone when you close STP. |
| Prompts |
Sent to Antigravity or Claude Code on your local machine. Not stored by STP. |
| API keys |
Stored in VS Code SecretStorage (your OS keychain). Never in a file, never leaves your machine. |
| Claude Code patches |
Backups of the original files are kept locally in .vtp-backups/ next to the patched extension. Restore command rolls them back. |
No telemetry. No analytics. No STP backend.
Codebase Map
A quick map of the source for anyone who wants to contribute, fork, or just understand how it works.
src/
extension.ts Entry point — registers the webview provider, commands, and Claude Code auto-patch
types.ts Shared TypeScript interfaces used across modules
panel/
VTPPanel.ts State machine, UI orchestration, API key management
CommandDetector.ts Voice trigger regex patterns (send, enhance, clean, pause, etc.)
audio/
AudioCapture.ts FFmpeg process management — mic capture, PCM chunking
DeepgramTranscriber.ts Raw WebSocket client for Deepgram nova-2 streaming
pipeline/
IntentProcessor.ts Classifies each transcript chunk (send / enhance / clean / clear / pause)
PromptElaborator.ts Gemini pass that rewrites a rough transcript into a production-ready prompt
PromptCleaner.ts Hybrid filler/repetition cleaner — regex first, Gemini only when noisy
CommandExecutor.ts Executes resolved intents — triggers enhance, clean-up, or send flows
ChatInjector.ts Routes the final prompt to Antigravity or Claude Code based on the active target
integrations/claudeCode/
patches.ts Hot-patch definitions for Claude Code's extension.js + webview/index.js
patcher.ts Apply / restore lifecycle, schema upgrade detection, backup management
conversations.ts Enumerates open Claude chat panels and the conversation lock
context/
WorkspaceContextCollector.ts Gathers open files, cursor position, and project structure for context
ConversationMatcher.ts Pulls recent Antigravity conversation history to ground the prompt
commands/
CommandRegistry.ts Maps VS Code command IDs to handler functions
config/
SecretManager.ts Wraps VS Code SecretStorage for Gemini and Deepgram API keys
tools/
patch-claude-code.js CLI patcher (alternative to the auto-patch on activate)
unpatch-claude-code.js CLI restore tool
claude-code-patch.js Shared patch definitions used by both the extension and the CLI
Data flow: Mic audio flows through AudioCapture into either Gemini (chunked) or DeepgramTranscriber (streamed). Transcripts hit IntentProcessor for classification. If the user said "enhance", PromptElaborator rewrites it with workspace context. Right before injection, PromptCleaner runs the auto-cleanup unless the buffer is already enhanced. The final prompt goes to ChatInjector, which dispatches to either Antigravity (via the native command) or Claude Code (via the patched IPC, optionally filtered by the locked conversation title).
License
MIT © 2026 — View on GitHub