Promptimize

Transform your voice into optimized prompts with AI-powered speech-to-text

A professional VSCode/Cursor extension that captures audio from your microphone, transcribes it using OpenAI Whisper, and intelligently transforms natural speech into structured, optimized prompts ready for LLM agents.

Quick Start

Install the extension (VSIX or Marketplace when available)
Run Setup Wizard — Command Palette → Promptimize: Setup Wizard
Configure OpenAI API key — Required for Whisper voice-to-text
Optionally choose optimization provider — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, or Cursor
Press Cmd+Alt+V (Transcribe) or Cmd+Alt+P (Promptimize) and speak

See the full Quick Start Guide and Recording Modes.

Two Services, Clear Roles

Service	Provider	Required	Credentials
Transcription	OpenAI Whisper	Yes	OpenAI API key
Prompt optimization	Your choice	No	Provider-specific API key

graph LR
    Voice[Your Voice] --> Whisper[OpenAI Whisper<br/>Transcription]
    Whisper --> RawText[Raw Text]
    RawText --> Choice{Optimization<br/>Enabled?}
    Choice -->|No| Editor[Insert to Editor]
    Choice -->|Yes| Provider[Your Chosen Provider]
    Provider --> OptimizedText[Optimized Prompt]
    OptimizedText --> Editor

🎯 Vision

Eliminate the friction between thinking and coding.

Developers often have complex architectural ideas, detailed requirements, or intricate technical explanations that are tedious to type but natural to speak. Promptimize bridges this gap by:

Capturing your spoken thoughts in real-time
Transcribing them with high accuracy using OpenAI Whisper
Transforming natural speech into structured, technical prompts
Inserting them automatically into your editor or Cursor chat

🔥 The Problem We Solve

Before Promptimize:

1. Think about complex architecture requirements
2. Struggle to type everything out
3. Lose train of thought while typing
4. End up with unstructured, verbose prompts
5. LLM misunderstands due to poor formatting

With Promptimize:

1. Press Cmd+Alt+V
2. Speak naturally about your requirements
3. Extension transcribes and optimizes automatically
4. Structured prompt appears in your editor/chat
5. LLM understands perfectly

✨ Features

Current (v0.1.0)

✅ Two Recording Modes — Transcribe (raw text) and Promptimize (optimized prompts)
✅ One-Click Recording — Dual status bar buttons or keyboard shortcuts
✅ High-Quality Transcription — OpenAI Whisper API integration
✅ Prompt Transformation — AI-powered optimization via 8 providers
✅ Multiple AI Providers — OpenAI, Anthropic, Google, Azure, Ollama, OpenCode, OpenRouter, and Cursor
✅ Configuration Webview — Interactive setup panel with provider comparison and system prompt editor
✅ Smart Insertion — Chat → editor → clipboard fallback chain
✅ Visual Feedback — Status bar states and progress notifications
✅ Secure Configuration — API keys stored in VSCode SecretStorage
✅ Cross-Platform — Works on macOS, Windows, and Linux

Coming Soon

🔄 Real-time Streaming — See transcription as you speak
🔄 Custom Vocabulary UI — Project-specific terms in configuration webview
🔄 Recording History — Review and re-use past transcriptions
🔄 Planned settings — audioQuality, maxRecordingDuration, showNotifications (defined but not yet applied)

🏗️ Architecture

Promptimize follows Clean/Hexagonal Architecture for maximum maintainability, testability, and scalability.

┌─────────────────────────────────────────────────────┐
│                  Presentation Layer                  │
│  (Commands, Status Bar)                              │
└────────────┬────────────────────────────────────────┘
             │
┌────────────▼────────────────────────────────────────┐
│                  Application Layer                   │
│      (Use Cases, Ports/Interfaces, DTOs)            │
└────────────┬────────────────────────────────────────┘
             │
┌────────────▼────────────────────────────────────────┐
│                    Domain Layer                      │
│    (Entities, Value Objects, Business Logic)         │
└─────────────────────────────────────────────────────┘
             │
┌────────────▼────────────────────────────────────────┐
│                Infrastructure Layer                  │
│  (OpenAI Whisper, Native Audio Capture, Config, Storage) │
└─────────────────────────────────────────────────────┘

See docs/architecture/ for detailed architecture documentation.

🛠️ Technology Stack

Core

TypeScript 5.4+ - Type-safe development
VSCode Extension API 1.120+ - Extension foundation
Node.js 22 LTS - Runtime environment
Webpack 5 - Bundling and optimization

Integrations

OpenAI API - Whisper for transcription, GPT-4 for prompt transformation
@kstonekuan/audio-capture - Native cross-platform microphone capture
VSCode SecretStorage - Secure credential management

Quality

Jest - Unit testing
ESLint + Prettier - Code quality and formatting
Husky - Git hooks for pre-commit checks

📦 Installation

From Marketplace (Coming Soon)

Open VSCode/Cursor
Go to Extensions (Cmd+Shift+X / Ctrl+Shift+X)
Search for "Promptimize"
Click Install

Manual Installation (Current)

Download the latest .vsix file from Releases
Open VSCode/Cursor
Go to Extensions
Click "..." menu → "Install from VSIX..."
Select the downloaded file

Upgrading from Cursor Whisper

The extension was renamed to Promptimize (vypdev publisher). If you previously installed cursor-whisper:

Uninstall the old Cursor Whisper extension
Install promptimize-*.vsix (or the new Marketplace listing when available)
Re-enter API keys (SecretStorage keys changed to promptimize.apiKey.*)
Update settings.json: replace cursorWhisper.* with promptimize.*
Update custom keybindings that reference cursor-whisper.* commands

⚙️ Configuration

First-Time Setup

After installation, run Promptimize: Setup Wizard (opens automatically on first launch)
Enter your OpenAI API key — required for Whisper transcription
Choose whether to enable prompt optimization and select a provider
Provide provider credentials when prompted (Anthropic, Google, Azure, etc.)
Test your configuration with Promptimize: Test Configuration

Note: Whisper transcription always uses OpenAI. Prompt optimization is optional and can use a different provider with its own API key.

Manual Configuration

Open Settings (Cmd+, / Ctrl+,) and search for "Promptimize":

{
  "promptimize.transcriptionLanguage": "en",
  "promptimize.enablePromptTransformation": true,
  "promptimize.transformationProvider": "openai",
  "promptimize.transformationModel": "gpt-4o",
  "promptimize.audioQuality": "high",
  "promptimize.maxRecordingDuration": 120,
  "promptimize.showNotifications": true
}

Transcription (Required — OpenAI Whisper)

Setting	Description
OpenAI API key	Required for voice-to-text. Configure via Setup Wizard or Configure OpenAI API Key (Whisper)
`transcriptionLanguage`	Language for transcription (`en`, `es`, `auto`, etc.)

Cost: ~$0.006/minute of audio

Prompt Optimization (Optional)

Prompt optimization converts transcribed speech into structured prompts. Choose a provider and supply credentials when required.

Setting	Description
`enablePromptTransformation`	Enable/disable optimization
`transformationProvider`	`openai`, `anthropic`, `google`, `azure`, `ollama`, `opencode`, `openrouter`, `cursor`
`transformationModel`	OpenAI model (when provider is `openai`)
`anthropicModel`	Claude model (when provider is `anthropic`)
`googleModel`	Gemini model (when provider is `google`)
`azureEndpoint` / `azureDeployment`	Azure OpenAI resource settings
`ollamaBaseUrl` / `ollamaModel`	Local Ollama server settings
`openCodeBaseUrl` / `openCodeModel`	Local OpenCode proxy settings
`openRouterModel`	OpenRouter model (when provider is `openrouter`)
`cursorModel`	Cursor model (when provider is `cursor`)

Use Promptimize: Configure Prompt Optimization Provider to set up interactively. See docs/configuration/ for provider setup.

Configuration Options

Setting	Type	Default	Description
`transcriptionLanguage`	string	`"auto"`	Language for transcription (`en`, `es`, `fr`, `de`, `auto`)
`enablePromptTransformation`	boolean	`true`	Transform transcription into optimized prompts
`transformationProvider`	string	`"openai"`	LLM provider for transformation (`openai`, `anthropic`, `google`, `azure`, `ollama`, `opencode`, `openrouter`, `cursor`)
`transformationModel`	string	`"gpt-4o"`	OpenAI model for transformation
`transcriptionHint`	string	`""`	Optional Whisper vocabulary hint (Settings only)
`audioQuality`	string	`"high"`	Planned — not yet applied (always 16 kHz mono)
`maxRecordingDuration`	number	`120`	Planned — not yet applied
`showNotifications`	boolean	`true`	Planned — not yet applied

🧪 Development & Testing

Prerequisites

Node.js 22+ installed (via nvm; see .nvmrc)
VSCode or Cursor IDE
OpenAI API key

Setup Development Environment

# Clone the repository
git clone https://github.com/vypdev/promptimize
cd promptimize

# Install dependencies (requires Node 22 — run `nvm use` first)
pnpm install

# Compile TypeScript
pnpm run compile

Debug the Extension

Open the project in VSCode/Cursor
Press F5 to start debugging
A new "Extension Development Host" window will open
The extension will be loaded in this window

Configure API Key

In the Extension Development Host window:
- Open Command Palette (Cmd/Ctrl+Shift+P)
- Type: "Promptimize: Configure API Key"
- Paste your OpenAI API key (starts with sk-...)
- The key is securely stored in your system's Keychain/Credential Manager

Test the Extension

Start Recording:
- Press Cmd/Ctrl+Alt+V (or click "Voice" in the status bar)
- Recording starts immediately in the background
Record Audio:
- Speak clearly into your microphone
- Ensure Cursor has microphone access in System Settings (macOS) or Privacy settings (Windows)
Stop Recording:
- Press the stop command or status bar action when done
Wait for Processing:
- Audio is transcribed (~5-10 seconds)
- Text is optimized with GPT-4 (optional)
- Text is automatically inserted into the active editor
Check Status:
- Status bar shows current state
- Notifications show progress and errors

Build Status

# Compile TypeScript
pnpm run compile

# Run linter
pnpm run lint

# Run tests (when available)
pnpm test

# Package extension (includes all platform native binaries)
pnpm run package

# Verify VSIX contains all platform binaries
pnpm run package:verify

Packaging for Distribution

To create a VSIX that works across all platforms (macOS, Linux, Windows):

pnpm run package

This will:

Install all platform-specific native binaries (darwin-arm64, darwin-x64, linux-x64-gnu, win32-x64-msvc)
Bundle them into the VSIX (~2.5MB total)
Create promptimize-X.X.X.vsix

To verify all binaries are included:

pnpm run package:verify

Expected output:

audio-capture-darwin-arm64
audio-capture-darwin-x64
audio-capture-linux-x64-gnu
audio-capture-win32-x64-msvc

Current Build: ✅ SUCCESS (577 KB bundle)

🚀 Usage

Recording Modes

Promptimize has two modes — see Recording Modes for full details.

Mode	Shortcut	Output
Transcribe	`Cmd/Ctrl+Alt+V`	Raw Whisper transcription
Promptimize	`Cmd/Ctrl+Alt+P`	Optimized structured prompt

Quick Start

Open your editor or Cursor chat
Press Cmd+Alt+V (Transcribe) or Cmd+Alt+P (Promptimize)
Speak naturally about your requirements
Click the status bar (Recording...) to stop
Transcribed or optimized text appears automatically

Status Bar

Three items appear in the status bar (right side):

Item	Idle	Recording
Transcribe	$(mic) Transcribe	$(record) Recording... (click to stop)
Promptimize	$(sparkle) Promptimize	$(record) Recording... (click to stop)
Settings	$(gear) Settings	Available during recording

During processing, progress appears in notifications (Transcribing..., Optimizing..., Inserting...).

Example Workflow

Spoken Input:

"I need to refactor the authentication service to support JWT tokens instead of sessions. We should maintain backward compatibility with existing session-based auth for 6 months. Also need unit tests for the new JWT validation logic and integration tests for the auth flow."

Optimized Output:

## Refactor Authentication Service to JWT

### Context

- Current implementation: session-based authentication
- Target implementation: JWT tokens

### Objectives

1. Implement JWT token generation and validation
2. Maintain backward compatibility with session-based auth
3. Provide 6-month deprecation period for sessions

### Technical Requirements

- JWT library integration
- Token validation middleware
- Session-to-JWT migration path

### Testing Requirements

- Unit tests for JWT validation logic
- Integration tests for complete auth flow
- Backward compatibility tests for sessions

### Timeline

- 6-month deprecation period for session-based auth

🎨 User Experience

Visual States

The status bar reflects recorder states; fine-grained progress (Transcribing, Optimizing) appears in notifications.

State	Status Bar	Description
Idle	$(mic) Transcribe / $(sparkle) Promptimize	Ready to record
Recording	$(record) Recording...	Actively recording (click to stop)
Processing	$(sync~spin) Processing...	Preparing audio after stop
Error	Error styling	Something went wrong

See UX States for the full state reference.

Keyboard Shortcuts

Shortcut	Action
`Cmd+Alt+V` / `Ctrl+Alt+V`	Start Transcribe recording
`Cmd+Alt+P` / `Ctrl+Alt+P`	Start Promptimize recording
`Escape`	Cancel recording (while recording)

Shortcuts start recording only — stop by clicking the status bar. See Keyboard Shortcuts.

Commands (Command Palette)

Command	Purpose
`Promptimize: Start Transcribe Recording`	Start raw transcription
`Promptimize: Stop Transcribe Recording`	Stop and process Transcribe
`Promptimize: Start Promptimize Recording`	Start optimized prompt
`Promptimize: Stop Promptimize Recording`	Stop and process Promptimize
`Promptimize: Cancel Recording`	Discard recording
`Promptimize: Open Configuration`	Configuration webview
`Promptimize: Configure OpenAI API Key (Whisper)`	Set Whisper API key
`Promptimize: Configure Prompt Optimization Provider`	Provider setup wizard
`Promptimize: Configure OpenAI Optimization Model`	Pick GPT model (OpenAI only)
`Promptimize: Test Configuration`	Test setup; opens results webview
`Promptimize: Setup Wizard`	Opens configuration panel

Deprecated: (Deprecated) Start Recording and (Deprecated) Stop Recording — use mode-specific commands instead.

🔒 Security & Privacy

Data Handling

Audio files are temporary - Deleted immediately after transcription
No local storage - Audio is never written to disk
API keys are encrypted - Stored in VSCode SecretStorage
No telemetry - Zero analytics or usage tracking
HTTPS only - All API calls are encrypted

API Key Security

Your OpenAI API key is:

Stored in VSCode's secure credential storage (SecretStorage)
Never exposed in logs or error messages
Never sent anywhere except OpenAI's official API
Accessible only by this extension

Microphone Permissions

The extension requests microphone access:

macOS: System Settings → Privacy & Security → Microphone
Windows: Settings → Privacy → Microphone
Linux: System-dependent, usually automatic

🏗️ Development

Prerequisites

Node.js 22+ (via nvm; see .nvmrc)
pnpm
VSCode 1.120+ for testing

Setup

# Clone the repository
git clone https://github.com/vypdev/promptimize.git
cd promptimize

# Install dependencies (requires Node 22 — run `nvm use` first)
pnpm install

# Build the extension
pnpm run compile

# Run tests
pnpm test

# Watch mode for development
pnpm run watch

Project Structure

promptimize/
├── src/
│   ├── application/     # Use cases and ports
│   ├── domain/          # Business entities
│   ├── infrastructure/  # External integrations
│   ├── presentation/    # UI and commands
│   ├── shared/          # Utilities and constants
│   └── extension.ts     # Entry point
├── docs/                # Comprehensive documentation
├── test/                # Unit and integration tests
└── package.json

See docs/architecture/ for detailed structure documentation.

Running Locally

Open the project in VSCode
Press F5 to launch Extension Development Host
The extension will be active in the new window
Test recording with Cmd+Alt+V

🧪 Testing

Automated tests cover use cases, transformers, and UI components — see docs/testing/strategy.md.

Run Tests

source scripts/ensure-node.sh && pnpm test

Test Strategy

Unit tests: Use cases and adapters with mocked ports (priority)
Manual smoke tests: Real recording → transcription → insertion before release

See docs/testing/strategy.md for critical test priorities and manual checklist.

📈 Roadmap

v0.1.0 (Current)

✅ Dual recording modes (Transcribe + Promptimize)
✅ Whisper transcription
✅ Prompt transformation (8 providers)
✅ Configuration webview
✅ Chat / editor / clipboard insertion
✅ API key configuration

v0.2.0 (Next)

🔄 Apply planned settings (audioQuality, maxRecordingDuration, showNotifications)
🔄 Transformation preview before insert
🔄 Transcription language in configuration webview

v0.3.0

🔄 Context-aware insertion improvements
🔄 Push-to-talk mode

v0.4.0

🔄 Real-time streaming transcription
🔄 Recording history
🔄 Edit before insert

v0.5.0

🔄 Custom vocabulary UI
🔄 Technical term correction

v1.0.0 (Stable)

🔄 Full production release
🔄 Performance optimization
🔄 Extensive testing

See PROGRESS.md for current project status.

🤝 Contributing

We welcome contributions! See docs/standards/coding-conventions.md for coding standards and development workflow.

Development Philosophy

Clean Architecture - Maintain clear layer separation
Type Safety - Strong TypeScript typing everywhere
Testability - Write testable, pure functions
Documentation - Document decisions and complex logic
User Experience - Prioritize UX over technical complexity

📝 Philosophy & Design Principles

Core Principles

Compatibility First - Real-world compatibility over theoretical solutions
User Experience - Minimal friction, maximum productivity
Maintainability - Clean code over clever hacks
Scalability - Built to grow and evolve
Privacy - User data never leaves their control

Why Clean Architecture?

Testability: Business logic independent of frameworks
Flexibility: Easy to swap implementations (e.g., different STT providers)
Maintainability: Clear responsibilities and boundaries
Scalability: Add features without breaking existing code

Why Dependency Injection?

Testability: Easy to mock dependencies
Flexibility: Configure different implementations
Maintainability: Clear dependency graph

🐛 Troubleshooting

See the full Troubleshooting Guide with decision trees.

Microphone not working

macOS:

Go to System Settings → Privacy & Security → Microphone
Ensure VSCode/Cursor is enabled

Windows:

Go to Settings → Privacy → Microphone
Ensure VSCode/Cursor has permission

Linux:

Permissions are usually automatic
Check pavucontrol if using PulseAudio

Transcription fails

Verify your OpenAI API key is valid
Check you have credits in your OpenAI account
Ensure audio duration is between 0.1s and 5 minutes
Check file size doesn't exceed 25MB

Text not inserting

Ensure you have an active editor or chat input focused
Check the status bar for error messages
Try manually pasting from clipboard (fallback behavior)

Cursor Agents Window issues

Promptimize works best in:

Classic Mode (cursor --classic)
Editor Window

Debug output and privacy

Transcriptions and optimized prompts are never written to logs. For troubleshooting, use the status bar, progress notifications, and error dialogs. Enable the Promptimize output channel only for operational messages (timestamps, durations, error types)—not user speech content.

MIT License - see LICENSE file for details.

🙏 Acknowledgments

OpenAI - Whisper and GPT-4 APIs
VSCode Team - Excellent extension API and documentation
Cursor Team - Innovation in AI-powered development

📬 Contact & Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: support@promptimize.dev

🔗 Links

Made with ❤️ for developers who think faster than they type

Promptimize

vyp.dev

Promptimize

Quick Start

Two Services, Clear Roles

🎯 Vision

🔥 The Problem We Solve

Before Promptimize:

With Promptimize:

✨ Features

Current (v0.1.0)

Coming Soon

🏗️ Architecture

🛠️ Technology Stack

Core

Integrations

Quality

📦 Installation

From Marketplace (Coming Soon)

Manual Installation (Current)

Upgrading from Cursor Whisper

⚙️ Configuration

First-Time Setup

Manual Configuration

Transcription (Required — OpenAI Whisper)

Prompt Optimization (Optional)

Configuration Options

🧪 Development & Testing

Prerequisites

Setup Development Environment

Debug the Extension

Configure API Key

Test the Extension

Build Status

Packaging for Distribution

🚀 Usage

Recording Modes

Quick Start

Status Bar

Example Workflow

🎨 User Experience

Visual States

Keyboard Shortcuts

Commands (Command Palette)

🔒 Security & Privacy

Data Handling

API Key Security

Microphone Permissions

🏗️ Development

Prerequisites

Setup

Project Structure

Running Locally

🧪 Testing

Run Tests

Test Strategy

📈 Roadmap

v0.1.0 (Current)

v0.2.0 (Next)

v0.3.0

v0.4.0

v0.5.0

v1.0.0 (Stable)

🤝 Contributing

Development Philosophy

📝 Philosophy & Design Principles

Core Principles

Why Clean Architecture?

Why Dependency Injection?

🐛 Troubleshooting

Microphone not working

Transcription fails

Text not inserting

Cursor Agents Window issues

Debug output and privacy

🙏 Acknowledgments

📬 Contact & Support

🔗 Links