VoicePrompt Dev

VS Code extension for recording speech locally, transcribing it, and moving the result directly into the active terminal or active VS Code chat tab for agent workflows.

Status

Repo: https://github.com/voicepromptdev/voiceprompt
Marketplace publisher: voicepromptdev
NPM scope: @voicepromptdev
VS Code extension ID: voicepromptdev.voiceprompt-dev
Current packaged build: voiceprompt-dev-0.2.9.vsix
License: MIT
Current focus: marketplace release readiness

Identity

VoicePrompt Dev uses two separate package identity systems:

VS Code Marketplace identity: publisher voicepromptdev plus extension name voiceprompt-dev, which produces the extension ID voicepromptdev.voiceprompt-dev
npm identity: scoped packages under @voicepromptdev, such as @voicepromptdev/core

These should not be conflated. The VS Code extension name stays voiceprompt-dev; npm scope ownership is a separate namespace for future JavaScript packages.

What It Does

Opens a voice input panel inside VS Code
Records audio locally on Linux with parec or arecord
Records audio locally on Windows with a bundled WinMM recorder path, with PowerShell fallback
Falls back to browser microphone capture when no native recorder is available, including typical macOS setups
Transcribes through the OpenAI audio API or a custom shell command
Pastes directly into the currently active VS Code chat tab, including Codex when that tab is active
Supports explicit chat provider routing for MVP use: currently Copilot and Codex
Supports direct chat open with prompt injection when the VS Code chat API is available
Auto-submits active chat panel sends when requested
Sends the transcript to the active VS Code terminal without pressing Enter
Auto-submits terminal sends when requested
Sends approval keys and Enter to the active terminal while the panel stays focused
Can jump focus from the panel back to the active CLI terminal
Pastes the transcript into the active editor at the current cursor
Supports push-to-talk with Space
Writes debug logs to the VoicePrompt Dev output channel
Shows an in-panel cost monitor for VoicePrompt-owned transcription usage
Adapts the panel to VS Code light and dark themes, including theme-specific logo treatment

Security and Privacy

Records audio locally and writes temporary capture files to the OS temp directory only for the active recording session
Removes temporary audio files after transcription or cancellation completes
Uses the OpenAI transcription API only when the OpenAI backend is selected
Supports a machine-scoped custom transcription command for private or local pipelines
Restricts sensitive configuration to machine scope so workspace settings cannot inject transcription commands or chat command IDs
Avoids passing the OpenAI API key on a subprocess command line

Platform Support

Windows: bundled recorder binary, with PowerShell fallback
Linux: parec or arecord
macOS and other environments without a native recorder: browser microphone fallback inside the VoicePrompt panel

Feature Inventory

Capture

Local audio capture instead of browser microphone capture inside the webview
Linux recorder detection for parec and arecord
Windows packaged recorder support, with PowerShell recorder fallback
Browser microphone fallback for platforms without a bundled/native recorder path
Tap-to-record and hold-to-record flows in the same panel
Cancelable recording session handling with local temp file cleanup

Transcription

OpenAI audio transcription support from inside the extension
Custom command transcription backend for local or private pipelines
Configurable OpenAI model and optional language hint
Windows API key lookup from VS Code settings, process environment, or persisted registry environment

Routing

Send transcript into the active VS Code terminal
Send transcript into the active VS Code chat surface
Paste transcript into the active editor at the current cursor
Automatic clipboard fallback when the target surface cannot be focused or written to reliably
Serialized routing queue to reduce overlapping terminal/chat send conflicts

Agent Workflow Controls

Auto-route target switch: terminal or chat
Chat provider selector for Copilot and Codex
Auto-submit toggle for chat or terminal workflows
Replace vs append toggle for reusing an in-progress prompt or draft
Clear Terminal / Chat control for wiping the current draft before the next send
Terminal approval hotkeys for Y, N, 1, 2, 3, and Enter without leaving the panel
One-key jump from the panel to the active terminal with T
Global Ctrl+Shift+Q shortcut to re-focus VoicePrompt from anywhere in VS Code

UX Details

Dedicated in-editor voice panel instead of a floating browser workflow
Theme-aware light/dark styling built to sit inside current VS Code themes
Status messaging and quick hints inside the panel
Collapsible transcription cost monitor with per-session VoicePrompt spend tracking
Subtle panel focus and recording-state borders so recording remains visible while scrolled
Automatic refocus back to VoicePrompt after terminal approval actions
Focus animation when VoicePrompt is restored

Why It Exists

On this Linux setup, native browser-style mic capture inside a VS Code webview was unreliable. The extension is built around local audio capture instead of browser-based recording inside VS Code.

Files

extension.js: VS Code extension entrypoint and webview UI
package.json: extension metadata and command contribution
build_vsix.py: no-dependency packager for creating the .vsix
LICENSE: MIT license for public reuse
codex-config.toml.snippet: feature flags to enable hidden Codex voice-related capabilities
voiceprompt-dev-0.2.9.vsix: packaged extension artifact
RECOVERY.md: setup, recovery, and known caveats
GITHUB_SETUP.md: one-time GitHub machine setup and publish flow
NEXT_STEPS.md: deferred notes for commercialization and marketplace publishing
packages/core/: placeholder npm package for reserving the @voicepromptdev scope
publish.sh: helper for wiring origin and pushing to GitHub

Install

Add the contents of codex-config.toml.snippet into ~/.codex/config.toml.
Restart VS Code.
In VS Code, open Extensions.
Use ... -> Install from VSIX....
Pick the newest bundled .vsix file, for example voiceprompt-dev-0.2.9.vsix.
Configure OpenAI transcription in user-level VS Code settings.

OpenAI API key setup

VoicePrompt Dev records audio locally, then sends that audio to OpenAI's speech-to-text API when voicePromptDev.transcriptionBackend is set to openai. This is the default setup.

Create or sign in to an OpenAI platform account at https://platform.openai.com/.
Add billing or credits in the OpenAI platform billing area. A ChatGPT subscription does not automatically cover API usage.
Create a secret API key at https://platform.openai.com/api-keys.
Copy the key once. Treat it like a password.
In VS Code, open Command Palette -> Preferences: Open User Settings (JSON).
Add these user-level settings:

{
  "voicePromptDev.transcriptionBackend": "openai",
  "voicePromptDev.openaiApiKey": "paste-your-openai-api-key-here"
}

Run Command Palette -> Developer: Reload Window.
Open VoicePrompt Dev and try a short recording.

Do not put voicePromptDev.openaiApiKey in workspace settings or commit it to git. It should live only in your personal VS Code user settings. If the extension says OpenAI transcription needs setup, the key is missing from the VS Code process that is running the extension.

If you prefer not to use the OpenAI API directly from the extension, switch to the command backend and provide a command that prints only the transcript to stdout. This setting is user-scoped for safety and should not be committed in workspace settings:

{
  "voicePromptDev.transcriptionBackend": "command",
  "voicePromptDev.transcriptionCommand": "my-transcriber --input {input}"
}

On Windows, if voicePromptDev.openaiApiKey is empty, the extension first checks the current VS Code process environment and then falls back to the persisted OPENAI_API_KEY stored in the user or machine environment registry. If you just changed the variable, restart VS Code so integrated terminals and other extensions inherit the updated environment too.

If your preferred chat provider does not follow VS Code's generic shared chat focus behavior, you can optionally override the command IDs the extension uses for chat targeting:

{
  "voicePromptDev.chatProviderCommand": "workbench.view.extension.codexSecondaryViewContainer",
  "voicePromptDev.chatOpenCommand": "workbench.chat.action.focusLastFocused",
  "voicePromptDev.chatFocusCommand": "workbench.action.chat.focusInput"
}

chatProviderCommand is for provider-specific surface activation. chatOpenCommand restores the shared chat host, and chatFocusCommand places the caret in the input box.

Other useful behavior defaults:

Auto-route defaults to Active Terminal
Terminal auto-submit defaults to on when terminal auto-route is selected
Replace mode defaults to on
Chat auto-submit can be enabled, but Codex submit reliability still depends on the currently available VS Code commands

Use

Open the Command Palette.
Run VoicePrompt Dev: Open.
Focus the VS Code terminal you want transcripts sent to.
In the panel, either:
- click Start Recording, speak, then click the same button again to stop and transcribe
- or hold Space while speaking, then release it
Pick the auto-send target in the panel: Active terminal or Active VS Code chat tab.
Use the listed hotkeys for recording control and terminal approvals.
By default, new transcripts replace the existing buffered transcript, auto-send to the active terminal, and auto-submit terminal sends when that auto-send destination is selected.

Landing Page Notes

If you are building a landing page or sales page from this repo, the most important current positioning points are:

Voice prompting for AI coding workflows inside VS Code
Local recording instead of browser mic capture
Direct routing into terminal, chat, or editor
Built for Codex-heavy and CLI-heavy developer loops
Supports both hosted OpenAI transcription and bring-your-own transcription backend

For current product/website messaging, also reflect these recent UX additions:

Theme-aware VS Code-native panel styling for both dark and light themes
Recording-state visibility improvements, including panel-level recording border feedback
Theme-specific logo swap between dark and light surfaces
Transcription-only cost monitoring inside the panel
Collapsible monitoring/details surfaces so the recording workflow stays compact

Shortcuts

Inside the panel:

R: start recording
Space: hold to record
E: send Enter to the active terminal while the panel stays focused
T: focus the active terminal
Ctrl+Shift+Q: focus VoicePrompt
1 / 2 / 3: send numbered choices to the active terminal
Y / N: send approval keys to the active terminal

Current Known Limits

The direct chat paste flow targets the currently active VS Code chat tab first, then falls back to opening a chat panel when needed.
Chat and terminal auto-submit rely on the available VS Code commands in the current environment, so fallback behavior may vary by extension version.
Browser microphone fallback is used only when a supported local recorder is not available.
The cost monitor tracks VoicePrompt transcription usage only. Downstream chat model usage and OpenAI wallet balance are not tracked in the panel.
Publishing to the VS Code Marketplace is not done yet; see NEXT_STEPS.md.

Rebuild

python3 build_vsix.py

Notes

If you want the repo handoff for later productization work, start with NEXT_STEPS.md.