Promptimize
Transform your voice into optimized prompts with AI-powered speech-to-text
A professional VSCode/Cursor extension that captures audio from your microphone, transcribes it using OpenAI Whisper, and intelligently transforms natural speech into structured, optimized prompts ready for LLM agents.

Quick Start
- Install the extension (VSIX or Marketplace when available)
- Run Setup Wizard — Command Palette →
Promptimize: Setup Wizard
- Configure OpenAI API key — Required for Whisper voice-to-text
- Optionally choose optimization provider — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, or Cursor
- Press
Cmd+Alt+V (Transcribe) or Cmd+Alt+P (Promptimize) and speak
See the full Quick Start Guide and Recording Modes.
Two Services, Clear Roles
| Service |
Provider |
Required |
Credentials |
| Transcription |
OpenAI Whisper |
Yes |
OpenAI API key |
| Prompt optimization |
Your choice |
No |
Provider-specific API key |
graph LR
Voice[Your Voice] --> Whisper[OpenAI Whisper<br/>Transcription]
Whisper --> RawText[Raw Text]
RawText --> Choice{Optimization<br/>Enabled?}
Choice -->|No| Editor[Insert to Editor]
Choice -->|Yes| Provider[Your Chosen Provider]
Provider --> OptimizedText[Optimized Prompt]
OptimizedText --> Editor
🎯 Vision
Eliminate the friction between thinking and coding.
Developers often have complex architectural ideas, detailed requirements, or intricate technical explanations that are tedious to type but natural to speak. Promptimize bridges this gap by:
- Capturing your spoken thoughts in real-time
- Transcribing them with high accuracy using OpenAI Whisper
- Transforming natural speech into structured, technical prompts
- Inserting them automatically into your editor or Cursor chat
🔥 The Problem We Solve
Before Promptimize:
1. Think about complex architecture requirements
2. Struggle to type everything out
3. Lose train of thought while typing
4. End up with unstructured, verbose prompts
5. LLM misunderstands due to poor formatting
With Promptimize:
1. Press Cmd+Alt+V
2. Speak naturally about your requirements
3. Extension transcribes and optimizes automatically
4. Structured prompt appears in your editor/chat
5. LLM understands perfectly
✨ Features
Current (v0.1.0)
- ✅ Two Recording Modes — Transcribe (raw text) and Promptimize (optimized prompts)
- ✅ One-Click Recording — Dual status bar buttons or keyboard shortcuts
- ✅ High-Quality Transcription — OpenAI Whisper API integration
- ✅ Prompt Transformation — AI-powered optimization via 8 providers
- ✅ Multiple AI Providers — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, and Cursor
- ✅ Configuration Webview — Interactive setup panel with provider comparison and system prompt editor
- ✅ Smart Insertion — Chat → editor → clipboard fallback chain
- ✅ Visual Feedback — Status bar states and progress notifications
- ✅ Secure Configuration — API keys stored in VSCode SecretStorage
- ✅ Cross-Platform — Works on macOS, Windows, and Linux
Coming Soon
- 🔄 Real-time Streaming — See transcription as you speak
- 🔄 Custom Vocabulary UI — Project-specific terms in configuration webview
- 🔄 Recording History — Review and re-use past transcriptions
- 🔄 Planned settings —
audioQuality, maxRecordingDuration, showNotifications (defined but not yet applied)
🏗️ Architecture
Promptimize follows Clean/Hexagonal Architecture for maximum maintainability, testability, and scalability.
┌─────────────────────────────────────────────────────┐
│ Presentation Layer │
│ (Commands, Status Bar) │
└────────────┬────────────────────────────────────────┘
│
┌────────────▼────────────────────────────────────────┐
│ Application Layer │
│ (Use Cases, Ports/Interfaces, DTOs) │
└────────────┬────────────────────────────────────────┘
│
┌────────────▼────────────────────────────────────────┐
│ Domain Layer │
│ (Entities, Value Objects, Business Logic) │
└─────────────────────────────────────────────────────┘
│
┌────────────▼────────────────────────────────────────┐
│ Infrastructure Layer │
│ (OpenAI Whisper, Native Audio Capture, Config, Storage) │
└─────────────────────────────────────────────────────┘
See docs/architecture/ for detailed architecture documentation.
🛠️ Technology Stack
Core
- TypeScript 5.4+ - Type-safe development
- VSCode Extension API 1.120+ - Extension foundation
- Node.js 22 LTS - Runtime environment
- Webpack 5 - Bundling and optimization
Integrations
- OpenAI API - Whisper for transcription, GPT-4 for prompt transformation
- @kstonekuan/audio-capture - Native cross-platform microphone capture
- VSCode SecretStorage - Secure credential management
Quality
- Jest - Unit testing
- ESLint + Prettier - Code quality and formatting
- Husky - Git hooks for pre-commit checks
📦 Installation
From Marketplace (Coming Soon)
- Open VSCode/Cursor
- Go to Extensions (
Cmd+Shift+X / Ctrl+Shift+X)
- Search for "Promptimize"
- Click Install
Manual Installation (Current)
- Download the latest
.vsix file from Releases
- Open VSCode/Cursor
- Go to Extensions
- Click "..." menu → "Install from VSIX..."
- Select the downloaded file
Upgrading from Cursor Whisper
The extension was renamed to Promptimize (vypdev publisher). If you previously installed cursor-whisper:
- Uninstall the old Cursor Whisper extension
- Install
promptimize-*.vsix (or the new Marketplace listing when available)
- Re-enter API keys (SecretStorage keys changed to
promptimize.apiKey.*)
- Update
settings.json: replace cursorWhisper.* with promptimize.*
- Update custom keybindings that reference
cursor-whisper.* commands
⚙️ Configuration
First-Time Setup
- After installation, run Promptimize: Setup Wizard (opens automatically on first launch)
- Enter your OpenAI API key — required for Whisper transcription
- Choose whether to enable prompt optimization and select a provider
- Provide provider credentials when prompted (Anthropic, Google, Azure, etc.)
- Test your configuration with Promptimize: Test Configuration
Note: Whisper transcription always uses OpenAI. Prompt optimization is optional and can use a different provider with its own API key.
Manual Configuration
Open Settings (Cmd+, / Ctrl+,) and search for "Promptimize":
{
"promptimize.transcriptionLanguage": "en",
"promptimize.enablePromptTransformation": true,
"promptimize.transformationProvider": "openai",
"promptimize.transformationModel": "gpt-4o",
"promptimize.audioQuality": "high",
"promptimize.maxRecordingDuration": 120,
"promptimize.showNotifications": true
}
Transcription (Required — OpenAI Whisper)
| Setting |
Description |
| OpenAI API key |
Required for voice-to-text. Configure via Setup Wizard or Configure OpenAI API Key (Whisper) |
transcriptionLanguage |
Language for transcription (en, es, auto, etc.) |
Cost: ~$0.006/minute of audio
Prompt Optimization (Optional)
Prompt optimization converts transcribed speech into structured prompts. Choose a provider and supply credentials when required.
| Setting |
Description |
enablePromptTransformation |
Enable/disable optimization |
transformationProvider |
openai, anthropic, google, azure, ollama, opencode, openrouter, cursor |
transformationModel |
OpenAI model (when provider is openai) |
anthropicModel |
Claude model (when provider is anthropic) |
googleModel |
Gemini model (when provider is google) |
azureEndpoint / azureDeployment |
Azure OpenAI resource settings |
ollamaBaseUrl / ollamaModel |
Local Ollama server settings |
openCodeBaseUrl / openCodeModel |
Local OpenCode proxy settings |
openRouterModel |
OpenRouter model (when provider is openrouter) |
cursorModel |
Cursor model (when provider is cursor) |
Use Promptimize: Configure Prompt Optimization Provider to set up interactively. See docs/configuration/ for provider setup.
Configuration Options
| Setting |
Type |
Default |
Description |
transcriptionLanguage |
string |
"auto" |
Language for transcription (en, es, fr, de, auto) |
enablePromptTransformation |
boolean |
true |
Transform transcription into optimized prompts |
transformationProvider |
string |
"openai" |
LLM provider for transformation (openai, anthropic, google, azure, ollama, opencode, openrouter, cursor) |
transformationModel |
string |
"gpt-4o" |
OpenAI model for transformation |
transcriptionHint |
string |
"" |
Optional Whisper vocabulary hint (Settings only) |
audioQuality |
string |
"high" |
Planned — not yet applied (always 16 kHz mono) |
maxRecordingDuration |
number |
120 |
Planned — not yet applied |
showNotifications |
boolean |
true |
Planned — not yet applied |
🧪 Development & Testing
Prerequisites
- Node.js 22+ installed (via nvm; see
.nvmrc)
- VSCode or Cursor IDE
- OpenAI API key
Setup Development Environment
# Clone the repository
git clone https://github.com/vypdev/promptimize
cd promptimize
# Install dependencies (requires Node 22 — run `nvm use` first)
pnpm install
# Compile TypeScript
pnpm run compile
Debug the Extension
- Open the project in VSCode/Cursor
- Press
F5 to start debugging
- A new "Extension Development Host" window will open
- The extension will be loaded in this window
- In the Extension Development Host window:
- Open Command Palette (
Cmd/Ctrl+Shift+P)
- Type: "Promptimize: Configure API Key"
- Paste your OpenAI API key (starts with
sk-...)
- The key is securely stored in your system's Keychain/Credential Manager
Test the Extension
Start Recording:
- Press
Cmd/Ctrl+Alt+V (or click "Voice" in the status bar)
- Recording starts immediately in the background
Record Audio:
- Speak clearly into your microphone
- Ensure Cursor has microphone access in System Settings (macOS) or Privacy settings (Windows)
Stop Recording:
- Press the stop command or status bar action when done
Wait for Processing:
- Audio is transcribed (~5-10 seconds)
- Text is optimized with GPT-4 (optional)
- Text is automatically inserted into the active editor
Check Status:
- Status bar shows current state
- Notifications show progress and errors
Build Status
# Compile TypeScript
pnpm run compile
# Run linter
pnpm run lint
# Run tests (when available)
pnpm test
# Package extension (includes all platform native binaries)
pnpm run package
# Verify VSIX contains all platform binaries
pnpm run package:verify
Packaging for Distribution
To create a VSIX that works across all platforms (macOS, Linux, Windows):
pnpm run package
This will:
- Install all platform-specific native binaries (
darwin-arm64, darwin-x64, linux-x64-gnu, win32-x64-msvc)
- Bundle them into the VSIX (~2.5MB total)
- Create
promptimize-X.X.X.vsix
To verify all binaries are included:
pnpm run package:verify
Expected output:
audio-capture-darwin-arm64
audio-capture-darwin-x64
audio-capture-linux-x64-gnu
audio-capture-win32-x64-msvc
Current Build: ✅ SUCCESS (577 KB bundle)
🚀 Usage
Recording Modes
Promptimize has two modes — see Recording Modes for full details.
| Mode |
Shortcut |
Output |
| Transcribe |
Cmd/Ctrl+Alt+V |
Raw Whisper transcription |
| Promptimize |
Cmd/Ctrl+Alt+P |
Optimized structured prompt |
Quick Start
- Open your editor or Cursor chat
- Press
Cmd+Alt+V (Transcribe) or Cmd+Alt+P (Promptimize)
- Speak naturally about your requirements
- Click the status bar (Recording...) to stop
- Transcribed or optimized text appears automatically
Status Bar
Three items appear in the status bar (right side):
| Item |
Idle |
Recording |
| Transcribe |
$(mic) Transcribe |
$(record) Recording... (click to stop) |
| Promptimize |
$(sparkle) Promptimize |
$(record) Recording... (click to stop) |
| Settings |
$(gear) Settings |
Available during recording |
During processing, progress appears in notifications (Transcribing..., Optimizing..., Inserting...).
Example Workflow
Spoken Input:
"I need to refactor the authentication service to support JWT tokens instead of sessions. We should maintain backward compatibility with existing session-based auth for 6 months. Also need unit tests for the new JWT validation logic and integration tests for the auth flow."
Optimized Output:
## Refactor Authentication Service to JWT
### Context
- Current implementation: session-based authentication
- Target implementation: JWT tokens
### Objectives
1. Implement JWT token generation and validation
2. Maintain backward compatibility with session-based auth
3. Provide 6-month deprecation period for sessions
### Technical Requirements
- JWT library integration
- Token validation middleware
- Session-to-JWT migration path
### Testing Requirements
- Unit tests for JWT validation logic
- Integration tests for complete auth flow
- Backward compatibility tests for sessions
### Timeline
- 6-month deprecation period for session-based auth
🎨 User Experience
Visual States
The status bar reflects recorder states; fine-grained progress (Transcribing, Optimizing) appears in notifications.
| State |
Status Bar |
Description |
| Idle |
$(mic) Transcribe / $(sparkle) Promptimize |
Ready to record |
| Recording |
$(record) Recording... |
Actively recording (click to stop) |
| Processing |
$(sync~spin) Processing... |
Preparing audio after stop |
| Error |
Error styling |
Something went wrong |
See UX States for the full state reference.
Keyboard Shortcuts
| Shortcut |
Action |
Cmd+Alt+V / Ctrl+Alt+V |
Start Transcribe recording |
Cmd+Alt+P / Ctrl+Alt+P |
Start Promptimize recording |
Escape |
Cancel recording (while recording) |
Shortcuts start recording only — stop by clicking the status bar. See Keyboard Shortcuts.
Commands (Command Palette)
| Command |
Purpose |
Promptimize: Start Transcribe Recording |
Start raw transcription |
Promptimize: Stop Transcribe Recording |
Stop and process Transcribe |
Promptimize: Start Promptimize Recording |
Start optimized prompt |
Promptimize: Stop Promptimize Recording |
Stop and process Promptimize |
Promptimize: Cancel Recording |
Discard recording |
Promptimize: Open Configuration |
Configuration webview |
Promptimize: Configure OpenAI API Key (Whisper) |
Set Whisper API key |
Promptimize: Configure Prompt Optimization Provider |
Provider setup wizard |
Promptimize: Configure OpenAI Optimization Model |
Pick GPT model (OpenAI only) |
Promptimize: Test Configuration |
Test setup; opens results webview |
Promptimize: Setup Wizard |
Opens configuration panel |
Deprecated: (Deprecated) Start Recording and (Deprecated) Stop Recording — use mode-specific commands instead.
🔒 Security & Privacy
Data Handling
- Audio files are temporary - Deleted immediately after transcription
- No local storage - Audio is never written to disk
- API keys are encrypted - Stored in VSCode SecretStorage
- No telemetry - Zero analytics or usage tracking
- HTTPS only - All API calls are encrypted
API Key Security
Your OpenAI API key is:
- Stored in VSCode's secure credential storage (SecretStorage)
- Never exposed in logs or error messages
- Never sent anywhere except OpenAI's official API
- Accessible only by this extension
Microphone Permissions
The extension requests microphone access:
- macOS: System Settings → Privacy & Security → Microphone
- Windows: Settings → Privacy → Microphone
- Linux: System-dependent, usually automatic
🏗️ Development
Prerequisites
- Node.js 22+ (via nvm; see
.nvmrc)
- pnpm
- VSCode 1.120+ for testing
Setup
# Clone the repository
git clone https://github.com/vypdev/promptimize.git
cd promptimize
# Install dependencies (requires Node 22 — run `nvm use` first)
pnpm install
# Build the extension
pnpm run compile
# Run tests
pnpm test
# Watch mode for development
pnpm run watch
Project Structure
promptimize/
├── src/
│ ├── application/ # Use cases and ports
│ ├── domain/ # Business entities
│ ├── infrastructure/ # External integrations
│ ├── presentation/ # UI and commands
│ ├── shared/ # Utilities and constants
│ └── extension.ts # Entry point
├── docs/ # Comprehensive documentation
├── test/ # Unit and integration tests
└── package.json
See docs/architecture/ for detailed structure documentation.
Running Locally
- Open the project in VSCode
- Press
F5 to launch Extension Development Host
- The extension will be active in the new window
- Test recording with
Cmd+Alt+V
🧪 Testing
Automated tests cover use cases, transformers, and UI components — see docs/testing/strategy.md.
Run Tests
source scripts/ensure-node.sh && pnpm test
Test Strategy
- Unit tests: Use cases and adapters with mocked ports (priority)
- Manual smoke tests: Real recording → transcription → insertion before release
See docs/testing/strategy.md for critical test priorities and manual checklist.
📈 Roadmap
v0.1.0 (Current)
- ✅ Dual recording modes (Transcribe + Promptimize)
- ✅ Whisper transcription
- ✅ Prompt transformation (8 providers)
- ✅ Configuration webview
- ✅ Chat / editor / clipboard insertion
- ✅ API key configuration
v0.2.0 (Next)
- 🔄 Apply planned settings (
audioQuality, maxRecordingDuration, showNotifications)
- 🔄 Transformation preview before insert
- 🔄 Transcription language in configuration webview
v0.3.0
- 🔄 Context-aware insertion improvements
- 🔄 Push-to-talk mode
v0.4.0
- 🔄 Real-time streaming transcription
- 🔄 Recording history
- 🔄 Edit before insert
v0.5.0
- 🔄 Custom vocabulary UI
- 🔄 Technical term correction
v1.0.0 (Stable)
- 🔄 Full production release
- 🔄 Performance optimization
- 🔄 Extensive testing
See PROGRESS.md for current project status.
🤝 Contributing
We welcome contributions! See docs/standards/coding-conventions.md for coding standards and development workflow.
Development Philosophy
- Clean Architecture - Maintain clear layer separation
- Type Safety - Strong TypeScript typing everywhere
- Testability - Write testable, pure functions
- Documentation - Document decisions and complex logic
- User Experience - Prioritize UX over technical complexity
📝 Philosophy & Design Principles
Core Principles
- Compatibility First - Real-world compatibility over theoretical solutions
- User Experience - Minimal friction, maximum productivity
- Maintainability - Clean code over clever hacks
- Scalability - Built to grow and evolve
- Privacy - User data never leaves their control
Why Clean Architecture?
- Testability: Business logic independent of frameworks
- Flexibility: Easy to swap implementations (e.g., different STT providers)
- Maintainability: Clear responsibilities and boundaries
- Scalability: Add features without breaking existing code
Why Dependency Injection?
- Testability: Easy to mock dependencies
- Flexibility: Configure different implementations
- Maintainability: Clear dependency graph
🐛 Troubleshooting
See the full Troubleshooting Guide with decision trees.
Microphone not working
macOS:
- Go to System Settings → Privacy & Security → Microphone
- Ensure VSCode/Cursor is enabled
Windows:
- Go to Settings → Privacy → Microphone
- Ensure VSCode/Cursor has permission
Linux:
- Permissions are usually automatic
- Check
pavucontrol if using PulseAudio
Transcription fails
- Verify your OpenAI API key is valid
- Check you have credits in your OpenAI account
- Ensure audio duration is between 0.1s and 5 minutes
- Check file size doesn't exceed 25MB
Text not inserting
- Ensure you have an active editor or chat input focused
- Check the status bar for error messages
- Try manually pasting from clipboard (fallback behavior)
Cursor Agents Window issues
Promptimize works best in:
- Classic Mode (
cursor --classic)
- Editor Window
Debug output and privacy
Transcriptions and optimized prompts are never written to logs. For troubleshooting, use the status bar, progress notifications, and error dialogs. Enable the Promptimize output channel only for operational messages (timestamps, durations, error types)—not user speech content.
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- OpenAI - Whisper and GPT-4 APIs
- VSCode Team - Excellent extension API and documentation
- Cursor Team - Innovation in AI-powered development
🔗 Links
Made with ❤️ for developers who think faster than they type