VoicePrompt Dev
VS Code extension for recording speech locally, transcribing it, and moving the result directly
into the active terminal or active VS Code chat tab for agent workflows.
Status
- Repo:
https://github.com/voicepromptdev/voiceprompt
- Marketplace publisher:
voicepromptdev
- NPM scope:
@voicepromptdev
- VS Code extension ID:
voicepromptdev.voiceprompt-dev
- Current packaged build:
voiceprompt-dev-0.2.9.vsix
- License: MIT
- Current focus: marketplace release readiness
Identity
VoicePrompt Dev uses two separate package identity systems:
- VS Code Marketplace identity: publisher
voicepromptdev plus extension name voiceprompt-dev, which produces the extension ID voicepromptdev.voiceprompt-dev
- npm identity: scoped packages under
@voicepromptdev, such as @voicepromptdev/core
These should not be conflated. The VS Code extension name stays voiceprompt-dev; npm scope ownership is a separate namespace for future JavaScript packages.
What It Does
- Opens a voice input panel inside VS Code
- Records audio locally on Linux with
parec or arecord
- Records audio locally on Windows with a bundled WinMM recorder path, with PowerShell fallback
- Falls back to browser microphone capture when no native recorder is available, including typical macOS setups
- Transcribes through the OpenAI audio API or a custom shell command
- Pastes directly into the currently active VS Code chat tab, including Codex when that tab is active
- Supports explicit chat provider routing for MVP use: currently
Copilot and Codex
- Supports direct chat open with prompt injection when the VS Code chat API is available
- Auto-submits active chat panel sends when requested
- Sends the transcript to the active VS Code terminal without pressing Enter
- Auto-submits terminal sends when requested
- Sends approval keys and Enter to the active terminal while the panel stays focused
- Can jump focus from the panel back to the active CLI terminal
- Pastes the transcript into the active editor at the current cursor
- Supports push-to-talk with
Space
- Writes debug logs to the
VoicePrompt Dev output channel
- Shows an in-panel cost monitor for VoicePrompt-owned transcription usage
- Adapts the panel to VS Code light and dark themes, including theme-specific logo treatment
Security and Privacy
- Records audio locally and writes temporary capture files to the OS temp directory only for the active recording session
- Removes temporary audio files after transcription or cancellation completes
- Uses the OpenAI transcription API only when the OpenAI backend is selected
- Supports a machine-scoped custom transcription command for private or local pipelines
- Restricts sensitive configuration to machine scope so workspace settings cannot inject transcription commands or chat command IDs
- Avoids passing the OpenAI API key on a subprocess command line
- Windows: bundled recorder binary, with PowerShell fallback
- Linux:
parec or arecord
- macOS and other environments without a native recorder: browser microphone fallback inside the VoicePrompt panel
Feature Inventory
Capture
- Local audio capture instead of browser microphone capture inside the webview
- Linux recorder detection for
parec and arecord
- Windows packaged recorder support, with PowerShell recorder fallback
- Browser microphone fallback for platforms without a bundled/native recorder path
- Tap-to-record and hold-to-record flows in the same panel
- Cancelable recording session handling with local temp file cleanup
Transcription
- OpenAI audio transcription support from inside the extension
- Custom command transcription backend for local or private pipelines
- Configurable OpenAI model and optional language hint
- Windows API key lookup from VS Code settings, process environment, or persisted registry environment
Routing
- Send transcript into the active VS Code terminal
- Send transcript into the active VS Code chat surface
- Paste transcript into the active editor at the current cursor
- Automatic clipboard fallback when the target surface cannot be focused or written to reliably
- Serialized routing queue to reduce overlapping terminal/chat send conflicts
Agent Workflow Controls
- Auto-route target switch: terminal or chat
- Chat provider selector for
Copilot and Codex
- Auto-submit toggle for chat or terminal workflows
- Replace vs append toggle for reusing an in-progress prompt or draft
- Clear Terminal / Chat control for wiping the current draft before the next send
- Terminal approval hotkeys for
Y, N, 1, 2, 3, and Enter without leaving the panel
- One-key jump from the panel to the active terminal with
T
- Global
Ctrl+Shift+Q shortcut to re-focus VoicePrompt from anywhere in VS Code
UX Details
- Dedicated in-editor voice panel instead of a floating browser workflow
- Theme-aware light/dark styling built to sit inside current VS Code themes
- Status messaging and quick hints inside the panel
- Collapsible transcription cost monitor with per-session VoicePrompt spend tracking
- Subtle panel focus and recording-state borders so recording remains visible while scrolled
- Automatic refocus back to VoicePrompt after terminal approval actions
- Focus animation when VoicePrompt is restored
Why It Exists
On this Linux setup, native browser-style mic capture inside a VS Code webview was unreliable.
The extension is built around local audio capture instead of browser-based recording inside VS Code.
Files
extension.js: VS Code extension entrypoint and webview UI
package.json: extension metadata and command contribution
build_vsix.py: no-dependency packager for creating the .vsix
LICENSE: MIT license for public reuse
codex-config.toml.snippet: feature flags to enable hidden Codex voice-related capabilities
voiceprompt-dev-0.2.9.vsix: packaged extension artifact
RECOVERY.md: setup, recovery, and known caveats
GITHUB_SETUP.md: one-time GitHub machine setup and publish flow
NEXT_STEPS.md: deferred notes for commercialization and marketplace publishing
packages/core/: placeholder npm package for reserving the @voicepromptdev scope
publish.sh: helper for wiring origin and pushing to GitHub
Install
- Add the contents of
codex-config.toml.snippet into ~/.codex/config.toml.
- Restart VS Code.
- In VS Code, open Extensions.
- Use
... -> Install from VSIX....
- Pick the newest bundled
.vsix file, for example voiceprompt-dev-0.2.9.vsix.
- Configure OpenAI transcription in user-level VS Code settings.
OpenAI API key setup
VoicePrompt Dev records audio locally, then sends that audio to OpenAI's speech-to-text API
when voicePromptDev.transcriptionBackend is set to openai. This is the default setup.
- Create or sign in to an OpenAI platform account at
https://platform.openai.com/.
- Add billing or credits in the OpenAI platform billing area. A ChatGPT subscription does not
automatically cover API usage.
- Create a secret API key at
https://platform.openai.com/api-keys.
- Copy the key once. Treat it like a password.
- In VS Code, open Command Palette ->
Preferences: Open User Settings (JSON).
- Add these user-level settings:
{
"voicePromptDev.transcriptionBackend": "openai",
"voicePromptDev.openaiApiKey": "paste-your-openai-api-key-here"
}
- Run Command Palette ->
Developer: Reload Window.
- Open VoicePrompt Dev and try a short recording.
Do not put voicePromptDev.openaiApiKey in workspace settings or commit it to git. It should
live only in your personal VS Code user settings. If the extension says OpenAI transcription
needs setup, the key is missing from the VS Code process that is running the extension.
If you prefer not to use the OpenAI API directly from the extension, switch to the command
backend and provide a command that prints only the transcript to stdout. This setting is user-scoped
for safety and should not be committed in workspace settings:
{
"voicePromptDev.transcriptionBackend": "command",
"voicePromptDev.transcriptionCommand": "my-transcriber --input {input}"
}
On Windows, if voicePromptDev.openaiApiKey is empty, the extension first checks the
current VS Code process environment and then falls back to the persisted OPENAI_API_KEY
stored in the user or machine environment registry. If you just changed the variable, restart
VS Code so integrated terminals and other extensions inherit the updated environment too.
If your preferred chat provider does not follow VS Code's generic shared chat focus behavior,
you can optionally override the command IDs the extension uses for chat targeting:
{
"voicePromptDev.chatProviderCommand": "workbench.view.extension.codexSecondaryViewContainer",
"voicePromptDev.chatOpenCommand": "workbench.chat.action.focusLastFocused",
"voicePromptDev.chatFocusCommand": "workbench.action.chat.focusInput"
}
chatProviderCommand is for provider-specific surface activation. chatOpenCommand restores the
shared chat host, and chatFocusCommand places the caret in the input box.
Other useful behavior defaults:
- Auto-route defaults to
Active Terminal
- Terminal auto-submit defaults to on when terminal auto-route is selected
- Replace mode defaults to on
- Chat auto-submit can be enabled, but Codex submit reliability still depends on the currently available VS Code commands
Use
- Open the Command Palette.
- Run
VoicePrompt Dev: Open.
- Focus the VS Code terminal you want transcripts sent to.
- In the panel, either:
- click
Start Recording, speak, then click the same button again to stop and transcribe
- or hold
Space while speaking, then release it
- Pick the auto-send target in the panel:
Active terminal or Active VS Code chat tab.
- Use the listed hotkeys for recording control and terminal approvals.
- By default, new transcripts replace the existing buffered transcript, auto-send to the active terminal, and auto-submit terminal sends when that auto-send destination is selected.
Landing Page Notes
If you are building a landing page or sales page from this repo, the most important current positioning points are:
- Voice prompting for AI coding workflows inside VS Code
- Local recording instead of browser mic capture
- Direct routing into terminal, chat, or editor
- Built for Codex-heavy and CLI-heavy developer loops
- Supports both hosted OpenAI transcription and bring-your-own transcription backend
For current product/website messaging, also reflect these recent UX additions:
- Theme-aware VS Code-native panel styling for both dark and light themes
- Recording-state visibility improvements, including panel-level recording border feedback
- Theme-specific logo swap between dark and light surfaces
- Transcription-only cost monitoring inside the panel
- Collapsible monitoring/details surfaces so the recording workflow stays compact
Shortcuts
Inside the panel:
R: start recording
Space: hold to record
E: send Enter to the active terminal while the panel stays focused
T: focus the active terminal
Ctrl+Shift+Q: focus VoicePrompt
1 / 2 / 3: send numbered choices to the active terminal
Y / N: send approval keys to the active terminal
Current Known Limits
- The direct chat paste flow targets the currently active VS Code chat tab first, then falls back to opening a chat panel when needed.
- Chat and terminal auto-submit rely on the available VS Code commands in the current environment, so fallback behavior may vary by extension version.
- Browser microphone fallback is used only when a supported local recorder is not available.
- The cost monitor tracks VoicePrompt transcription usage only. Downstream chat model usage and OpenAI wallet balance are not tracked in the panel.
- Publishing to the VS Code Marketplace is not done yet; see
NEXT_STEPS.md.
Rebuild
python3 build_vsix.py
Notes
If you want the repo handoff for later productization work, start with NEXT_STEPS.md.