Whisper Assistant: Your Voice-Driven Coding Companion
Whisper Assistant is an extension for Visual Studio Code that transcribes your spoken words into text within the VSCode editor. This hands-free approach to coding allows you to focus on your ideas instead of your typing.
Whisper Assistant can also be integrated with other powerful AI tools, such as Chat GPT-4 or the Cursor.so application, to create a dynamic, AI-driven development environment.
Powered by OpenAI Whisper
Whisper Assistant utilizes the Whisper AI locally, offering a free voice transcription service.
By default, the base model of Whisper AI is used, balancing accuracy and performance. You can select a different model in the extension settings, but remember to download your chosen model before using Whisper Assistant. Failure to download the selected model can lead to errors. The base model is recommended and set as default.
For more details about Whisper, visit the Whisper OpenAI GitHub page.
Getting Started: Installation Instructions
To install and setup Whisper Assistant, follow these steps:
Install SoX to enable easy microphone recording through the command line.
Install Whisper locally. This requires the following prerequisites:
Install the Whisper Assistant extension into Visual Studio Code or the Cursor.so application.
How to Use Whisper Assistant
- Initialization: Upon loading Visual Studio Code, the extension verifies the correct installation of SoX and Whisper. If any issues are detected, an error message will be displayed. These dependencies must be installed to use the extension.
Once initialization is complete, a quote icon will appear in the bottom right status bar.
- Starting the Recording: Activate the extension by clicking on the quote icon or using the shortcut
Command+M
(for Mac) or Control+M
(for Windows). You can record for as long as you like, but remember, the longer the recording, the longer the transcription process. The recording time will be displayed in the status bar.
- Stopping the Recording: Stop the recording using the same shortcut (
Command+M
or Control+M
). The extension icon in the status bar will change to a loading icon, and a progress message will be displayed, indicating that the transcription is underway.
- Transcription: Once the transcription is complete, the text will be saved to the clipboard. This allows you to use the transcription in any program, not just within Visual Studio Code. If an editor is active, the transcription will be pasted there automatically.
Tip: A good microphone will improve transcription accuracy, although it is not a requirement.
Tip: For an optimal experience, consider using the Cursor.so application to directly call the Chat GPT-4 API for code instructions. This allows you to use your voice to instruct GPT to refactor your code, write unit tests, and implement various improvements.
Using Whisper Assistant with Cursor.so
To enhance your development experience with Cursor.so and Whisper Assistant, follow these simple steps:
- Start the recording: Press
Command+M
(Mac) or Control+M
(Windows).
- Speak your instructions clearly.
- Stop the recording: Press
Command+M
(Mac) or Control+M
(Windows).
Note: This initiates the transcription process.
- Open the Cursor dialog: Press
Command+K
or Command+L
.
Important: Do this before the transcription completes.
- The transcribed text will automatically populate the Cursor dialog. Here, you can edit the text or add files/docs, then press
Enter
to execute the GPT query.
By integrating Cursor.so with Whisper Assistant, you can provide extensive instructions without the need for typing, significantly enhancing your development workflow.
Disclaimer
Please note that this extension has been primarily tested on Mac OS. While efforts have been made to ensure compatibility, its functionality on other platforms such as Windows or Linux cannot be fully guaranteed. I welcome and appreciate any pull requests to address potential issues encountered on these platforms.
Development Setup
- Copy the settings template:
Local Development with Faster Whisper
This extension supports using a local Faster Whisper model through Docker. This provides faster transcription and doesn't require an API key.
Setting up the Local Server
- Install Docker on your system
- Create a new file named
Dockerfile
with the content from the repository
- Build and run the container:
docker build -t whisper-assistant-server .
docker run -d -p 4444:4444 --name whisper-assistant whisper-assistant-server
Using the Local Server
- Open VSCode settings (File > Preferences > Settings)
- Search for "Whisper Assistant"
- Set "Api Provider" to "localhost"
- Set "Api Key" to any non-empty string (e.g., "local")
- The extension will now use your local Faster Whisper server
Available Models
The local server uses the "base" model by default. To modify the model, update the model initialization in the Dockerfile:
# Pre-download the model during build
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('medium', device='cpu', compute_type='int8')"
# ... and update the model initialization in the embedded Python code:
whisper_model = WhisperModel("medium", device="cpu", compute_type="int8")
Available models:
- tiny (fastest, least accurate)
- base (default, good balance)
- small
- medium
- large-v2
- large-v3 (slowest, most accurate)
Note: Larger models require more memory but provide better accuracy.
Using GPU Acceleration
If you have a CUDA-capable GPU, modify the Dockerfile:
FROM python:3.10.13-slim
# Install CUDA dependencies
RUN apt-get update --fix-missing && apt-get install -y \
git \
ffmpeg \
cuda-toolkit-11-8 \
&& rm -rf /var/lib/apt/lists/*
# ... rest of Dockerfile remains the same, but update the model initialization:
RUN python -c "from faster_whisper import WhisperModel; WhisperModel('base', device='cuda', compute_type='float16')"
# And update in the embedded Python code:
whisper_model = WhisperModel("base", device="cuda", compute_type="float16")
Run the container with GPU support:
docker run -d -p 4444:4444 --gpus all whisper-assistant-server
Troubleshooting
Check if the server is running:
curl http://localhost:4444/v1/health
View server logs:
docker logs whisper-assistant
If you encounter memory issues:
First-time startup may be slow as the model is downloaded during build