Mantra
Code with your thoughts, not your keyboard. Extremely accurate, absurdly fast.
Mantra listens to your voice and instantly edits code, runs IDE commands, executes terminal commands, and interacts with AI agents — all hands-free. Works in VS Code and Cursor.
Free to use — no API keys or setup required.
Discord: https://discord.gg/fmWCScWuUn
Please read the Privacy and Data Handling section before use.
Use a good quality and well-positioned desktop microphone for best results.
How It Works
- Speech-to-text: Audio is streamed for real-time transcription with built-in turn detection. The most frequent identifiers from your open file are sent as keyterms to bias recognition toward your code's vocabulary. Live partial transcripts appear in the status bar as you speak.
- Context-aware routing: When the IDE is the active window, the transcript goes through the full pipeline — commands, text operations, and LLM routing.
- LLM routing (IDE focused): The transcript + editor context + terminal history is sent to a fast LLM. The LLM classifies the instruction and returns one of four types:
- command — runs an IDE command (130+ supported)
- modification — applies a small, targeted edit to selected text (changes are highlighted in green/red). Only available when you have text manually selected in the editor. Without a selection, code-edit requests are routed to the agent.
- terminal — translates natural language to a shell command and executes it
- agent — forwards the transcript to the selected AI agent (Claude Code or Cursor). This is the default for any non-trivial task. When no agent is selected, requests are answered by a built-in Q&A agent shown in the Output panel.
- Pre-LLM shortcuts: Common phrases like "undo", "save", "scroll down", "enter", "delete", "focus editor", "ask Claude ...", keyboard shortcuts, and symbol navigation ("go to the handleCommand function") are handled instantly without waiting for the LLM.
Setup
1) Install Mantra
Install from the VS Code Marketplace or the Open VSX Registry. Also works in Cursor — install via Extensions: Install from VSIX or search "Mantra" in the Cursor extension marketplace.
2) Start!
Run "Mantra: Start Recording" from the Command Palette, press Ctrl+Shift+1, or click Hands-Free Mode in the Mantra sidebar panel.
No API keys needed — everything works out of the box.
Cursor users: The agent backend is automatically set to Cursor so voice commands route to the Cursor composer panel by default.
Recording Modes
Hands-Free Mode
Press Ctrl+Shift+1 or click Hands-Free Mode in the sidebar. Mantra listens continuously, detects speech turns automatically, and processes each one. Live partial transcripts appear in the bottom-left status bar as you speak. Auto-stops after 5 minutes of silence.
Push to Talk
Hold the Push to Talk button in the sidebar. Mantra records while you hold, then transcribes and processes when you release. Also shows live partial transcripts in the status bar.
What You Can Say
Code editing (requires text selection)
Code edits only happen when you have text selected in the editor — either selected manually or via a voice "select" command. Select the lines you want to change, then speak:
- "change this to a while loop"
- "rename this variable to count"
- "add a docstring to this function"
- "remove the comments"
- "for i in range len nums print nums i" (raw code dictation)
Without a selection, code-edit requests are routed to the agent (if active) or answered as a question.
Voice selection
Say "select" to select code by description, then follow up with an edit:
- "select this function" → selects the enclosing function
- "select the inner for loop" → selects just the nested loop, not the outer one
- "select the if statement" → selects the full if/elif/else chain
- "highlight the try block" → selects the try/except block
- "select lines 10 to 20" → selects exact line range
Once selected, your next voice command can edit it: "select this function" → "add a docstring" performs the edit on the selected code.
Agent tasks
When an agent is active, complex or unselected code tasks go to the agent automatically:
- "create a terminal-based tic tac toe game"
- "add a helper function to validate user input"
- "add authentication"
- "refactor this to use async/await"
IDE commands
- "undo", "redo", "save", "format document"
- "close this file", "open utils dot java"
- "select lines 4 to 19", "go to line 20"
- "delete", "delete line", "delete this line"
- "scroll down", "scroll up 5 lines", "page up"
- "toggle sidebar", "zen mode", "zoom in"
- "focus editor", "focus terminal", "focus explorer"
- "next tab", "previous tab", "first tab", "tab three"
Symbol navigation
Navigate directly to any function, class, method, or symbol in the current file:
- By name: "go to the handleCommand function", "go to function processActivityFile", "jump to class GameBoard"
- By description (semantic): "go to the function that resets the board", "go to the error handler", "jump to the input validation function"
Name-based matching uses fast token fuzzy matching (no LLM call). If the name doesn't match closely enough, or if you describe what a symbol does instead of saying its name, Mantra falls back to an LLM-powered semantic match.
Relative navigation also works via the LLM: "go to the next else if", "jump to the catch block" — the LLM finds the target line in the visible code and navigates there.
Opening files
- "open script dot py" → opens
script.py
- "open main" → fuzzy-matches
main.py, main.ts, etc.
- "open auth dot controller dot ts" → opens
auth.controller.ts
File names are fuzzy-matched against the workspace. You can say the extension ("dot py") or omit it. When the IDE is focused, "open X" tries to match a workspace file first; only if no file matches does it try to open a macOS app.
Terminal commands
- "run this file" →
python3 main.py
- "create a virtual environment" →
python3 -m venv venv
- "install the requests library" →
pip3 install requests
- "check git status" →
git status
Terminal commands are executed by default. If you want to just type without executing, say something like "create a virtual environment but don't run it".
Keyboard shortcuts (macOS, any app)
Any modifier+key combo spoken naturally is executed via the system:
- "command B" → Cmd+B
- "control shift P" → Ctrl+Shift+P
- "command shift F" → Cmd+Shift+F
Stop / Resume
Say "pause", "stop", or "stop listening" to stop. Say "resume" or use Ctrl+Shift+1 to start again. Hands-free mode also auto-stops after 5 minutes of no speech.
Agent Integration
Mantra supports multiple AI agent backends:
- Cursor — sends prompts to the Cursor composer panel. Auto-selected when running in Cursor IDE. Only available in Cursor (grayed out in VS Code).
- Claude Code (Terminal) — sends prompts to the Claude Code CLI running in a terminal. Requires the
claude CLI in your PATH.
- Claude Code (Extension) — sends prompts to the Claude Code VS Code extension sidebar panel. No CLI needed — just install the extension.
- Q&A — built-in Q&A agent shown in the Output panel. No external agent needed.
Select your preferred mode from the Agent dropdown in the Mantra sidebar. When running in Cursor, the agent defaults to Cursor automatically.
When an agent is active, it becomes the default destination for any non-trivial request and for all code-edit requests when no text is selected. You don't need to say "ask Claude" — just speak naturally and tasks are automatically routed to the agent.
Sending prompts to the agent
Say "ask Claude to refactor this function", "ask agent how to fix that", or just "add an AI opponent", "improve the performance", "add authentication" — it routes to the agent automatically.
When "Send Context to Agent" is enabled, the current editor state (filename, cursor position, selected text), activity log, terminal history, and workspace file listing are written to a context file. The first message in each session instructs the agent to read this file before responding; follow-up messages send just the raw transcript.
Common phrases like "ask Claude ...", "ask agent ...", "ask LLM ...", "ask AI ..." are intercepted before the LLM for instant routing. These also work when the IDE is not focused.
How agent modes work
- Terminal mode types prompts into the Claude CLI terminal without pressing Enter. You review and say "enter" to send.
- Extension mode pastes prompts into the Claude Code extension sidebar without submitting. You review and say "enter" to send.
- Cursor mode pastes prompts into the Cursor composer panel without submitting. You review and say "enter" to send.
All modes prepend context on the first message and support follow-up messages in the same conversation (no new tabs or panels created on follow-ups).
While the agent is running
- Commands still work normally — "save file", "undo", "focus terminal" all execute as usual
- Questions and conversation go to the agent — "how do I fix this error?" types into the agent
- "enter" — submits the current input
- "up" / "down" — arrow keys for navigating selection menus (Terminal mode)
- "yes" / "ok" / "go ahead" — confirms the current selection
- "focus editor" / "go back" — switches back to the editor
- "focus agent" / "open claude" — switches to (or opens) the agent panel
Agent voice commands
- "new conversation" / "clear conversation"
- "resume conversation" (Terminal mode)
- "accept changes" / "reject changes" (for proposed diffs)
- "stop" / "interrupt" / "cancel" (sends Ctrl+C, Terminal mode)
System Commands (Any App)
These commands work regardless of which app is in the foreground. They are processed before any IDE or LLM logic.
Mouse
| Say |
Action |
| "click" |
Left click at current mouse position |
| "double click" |
Double click at current mouse position |
| "right click" |
Right click at current mouse position |
| "move mouse up/down/left/right [N]" |
Move mouse N pixels (default 50) |
Open & switch apps
| Say |
Action |
| "open Safari", "open Chrome", "open Slack" |
Open or focus the named app |
| "open VS Code", "open Cursor", "open IDE" |
Open IDE |
| "switch to Safari", "switch to Terminal" |
Bring the named app to front |
Polite phrasing works too — "could you please open Safari" is handled correctly.
Browser navigation
| Say |
Action |
| "back" / "go back" |
Cmd+[ (browser back) |
| "forward" / "go forward" |
Cmd+] (browser forward) |
| "refresh" / "reload" |
Cmd+R |
| "hard refresh" |
Cmd+Shift+R |
| "new tab" |
Cmd+T |
| "close tab" |
Cmd+W |
| "reopen tab" |
Cmd+Shift+T |
| "address bar" / "url bar" |
Cmd+L |
| "bookmark" |
Cmd+D |
Key presses
| Say |
Action |
| "press enter", "press escape", "press tab" |
Sends that key |
| "press up", "press down", "press left", "press right" |
Arrow keys |
| "press page up", "press page down", "press home", "press end" |
Navigation keys |
Type text
| Say |
Action |
| "type hello world" |
Types the text via clipboard paste |
Window management
| Say |
Action |
| "minimize" |
Cmd+M |
| "close window" |
Cmd+W |
| "full screen" |
Cmd+Ctrl+F |
| "next window" / "previous window" |
Cmd+` / Cmd+Shift+` |
| "hide" / "hide app" |
Cmd+H |
| "mission control" |
Ctrl+Up |
System
| Say |
Action |
| "spotlight" / "search computer" |
Cmd+Space |
| "screenshot" |
Cmd+Shift+3 (full screen) |
| "screenshot selection" |
Cmd+Shift+4 (area) |
| "lock screen" |
Cmd+Ctrl+Q |
Unfocused Commands (When Another App Is Active)
When the IDE is not the frontmost window, these commands route keystrokes to whatever app you're using (Safari, Terminal.app, Finder, etc.).
Arrow keys & repetition
| Say |
Action |
| "up", "down", "left", "right" |
Arrow key |
| "up 5 times", "down three times" |
Repeat arrow key N times |
| Say |
Action |
| "scroll up" / "scroll down" |
Smooth scroll (15 arrow presses) |
| "scroll up a lot" / "scroll down a lot" |
Big scroll (2 page jumps) |
| "page up" / "page down" |
Single page jump |
| "scroll to top" / "scroll to bottom" |
Cmd+Home / Cmd+End |
Basic keys
| Say |
Action |
| "enter" / "return" / "submit" |
Enter key |
| "escape" / "cancel" / "dismiss" |
Escape key |
| "tab" |
Tab key |
| "space" |
Space key |
| "delete" / "backspace" |
Backspace key |
Tab switching
| Say |
Action |
| "next tab" / "previous tab" |
Ctrl+Tab / Ctrl+Shift+Tab |
| "first tab", "second tab", ..., "ninth tab" |
Cmd+1 through Cmd+9 |
| "last tab" |
Cmd+9 |
Text editing
| Say |
Action |
| "undo" / "redo" |
Cmd+Z / Cmd+Shift+Z |
| "copy" / "paste" / "cut" |
Cmd+C / Cmd+V / Cmd+X |
| "select all" |
Cmd+A |
| "find" / "search" |
Cmd+F |
| "save" |
Cmd+S |
| "close" / "quit" |
Cmd+W / Cmd+Q |
| "zoom in" / "zoom out" / "reset zoom" |
Cmd+= / Cmd+- / Cmd+0 |
| Say |
Action |
| "dev tools" / "inspect element" |
Cmd+Option+I |
| "console" / "open console" |
Cmd+Option+J |
| "view source" |
Cmd+Option+U |
Terminal.app / iTerm shortcuts
| Say |
Action |
| "clear" / "clear terminal" |
Cmd+K |
| "interrupt" / "control c" / "kill process" |
Ctrl+C |
| "exit terminal" / "control d" |
Ctrl+D |
| "suspend" / "control z" |
Ctrl+Z |
| "reverse search" / "control r" |
Ctrl+R |
| "beginning of line" / "control a" |
Ctrl+A |
| "end of line" / "control e" |
Ctrl+E |
| "clear line" / "control u" |
Ctrl+U |
| "delete word" / "control w" |
Ctrl+W |
Finder shortcuts
| Say |
Action |
| "show hidden files" |
Cmd+Shift+. |
| "go to folder" |
Cmd+Shift+G |
| "new folder" |
Cmd+Shift+N |
| "get info" / "file info" |
Cmd+I |
Mantra adds a panel to the activity bar. From the sidebar you can:
- Hands-Free Mode / Stop — toggle button to start or stop continuous listening
- Push to Talk — hold the button to record, release to transcribe and process
- Stop / Stop & Transcribe (
Ctrl+Shift+2 / Ctrl+Shift+3) — while recording, stop (discard audio) or stop and transcribe what's been said so far
- Test Microphone — verify your mic is working with a live volume meter
- Activity Log — scrollable history of every transcript, command, code edit, terminal action, and agent interaction. Code modifications include:
- Show diff — toggle to view exactly what changed (green/red highlighting)
- Open in tab — open the diff in a full editor tab
- Undo / Redo — revert or re-apply a specific change
- Focus — quick buttons to switch between Editor, Terminal, Agent, Explorer, Search, and Source Control
- Settings
- Microphone — pick your input device
- Agent — choose Cursor, Claude Code (Terminal), Claude Code (Extension), or Q&A. Cursor is auto-selected and only available in Cursor IDE.
- Commands-Only Mode — bypass the LLM entirely (ON/OFF toggle)
- Send Context to Agent — include editor state, activity log, terminal history, and workspace files in agent prompts (ON by default)
- All Settings / Keyboard Shortcuts
- Router Prompt / Selection Model Prompt — view and edit the LLM system prompts directly in the sidebar
Commands-Only Mode
Toggle via the sidebar or Command Palette. When enabled:
- No LLM calls — speech is transcribed but only matched against pre-mapped commands and text operations
- What works: all 130+ IDE commands, text operations, keyboard shortcuts, system commands, focus/navigation, and pause/resume
- What doesn't work: code edits, questions, terminal command generation, and agent forwarding
Useful for low-latency command execution with no LLM calls beyond speech-to-text.
Agent Context
When "Send Context to Agent" is enabled (the default), Mantra writes a context file before each agent prompt containing:
- Editor state — current filename, language, cursor position, selected text (if any)
- Activity log — timestamped history of commands, edits, and transcripts
- Terminal history — recent shell commands and their output
- Workspace files — listing of files and folders in the project
The first message in each session includes a single-line instruction telling the agent to read the context file. Follow-up messages send just the raw transcript. The context file is updated before every message, after every log entry, and after every terminal command.
Selection model: When the transcript contains a selection keyword ("select", "highlight", "lines X to Y"), Mantra runs a separate lightweight LLM call to determine the exact lines to select — "select the inner for loop", "select this function", "highlight the try block". This lets you select code by natural language description with precision, including nested constructs.
Commands & Keyboard Shortcuts
| Action |
Shortcut |
| Start Recording |
Ctrl+Shift+1 |
| Stop Listening |
Ctrl+Shift+2 |
| Stop & Transcribe |
Ctrl+Shift+3 |
| Open Settings |
Ctrl+Shift+4 |
| Select Microphone |
Ctrl+Shift+5 |
| Push to Talk |
Sidebar button (hold) |
All shortcuts can be customized in File > Preferences > Keyboard Shortcuts (search "mantra"), or via the Keyboard Shortcuts button in the sidebar.
Settings
Open Settings > Extensions > Mantra to adjust:
- Agent Backend — Cursor, Claude Code (Terminal), Claude Code (Extension), or Q&A. Defaults to Cursor in Cursor IDE, Q&A elsewhere.
- Reasoning Effort — Low (default), medium, or high
- Router Prompt — Customize the LLM system prompt (also editable in the sidebar)
- Selection Model Prompt — Customize the prompt used for voice-based code selection (also editable in the sidebar)
- Memory Manager Prompt — Customize the prompt used for the memory manager
- Commands Only — Bypass the LLM entirely
- Send Context to Agent — Include editor state, activity log, terminal history, and workspace files in agent prompts (default: on)
- Microphone Input — Set via Command Palette > "Mantra: Select Microphone". Advanced users can paste raw FFmpeg input args.
Privacy and Data Handling
- Secret scrubbing: Before any data leaves your machine, Mantra scrubs secrets from terminal history, file content, and context files. This includes API keys, tokens, passwords, PEM blocks, JWTs, connection strings with credentials, CLI flag arguments (
--pat, --token, --password), export assignments with secret-like names, output of dangerous commands (env, printenv, cat .env), and high-entropy strings that look like tokens. Scrubbing happens at every external boundary — the context file, the LLM prompt, and agent dispatches.
- Sensitive file detection: If the active editor contains a sensitive file (
.env, .pem, credentials, etc.) or inline secrets are detected, Mantra warns you before sending anything to the LLM.
- What goes to the STT service: The most frequent identifiers from your open file are sent as keyterms to bias recognition toward your code's vocabulary. Audio is streamed live (not stored as a file). Mantra's account with the STT service has opted out of data sharing.
- What goes to the LLM service: The current file's contents (scrubbed), file name, cursor context, and terminal history (scrubbed). The LLM service does not retain inputs or outputs from the model usage.
- For more: Other than the API usage described above, Mantra runs entirely locally and does not collect, save, or share any of your data.
Troubleshooting
- No mic on macOS — Allow the IDE (VS Code or Cursor) under System Settings > Privacy & Security > Microphone.
- Mouse click not working — Allow the IDE under System Settings > Privacy & Security > Accessibility.
- "Command not found: claude" — Only needed for Claude Code Terminal mode. Add the CLI to your PATH:
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc && source ~/.zshrc. Alternatively, use Extension mode or Cursor mode.
- Cursor agent grayed out — The Cursor agent option is only available when running inside Cursor IDE. It appears as "Cursor (Not available)" in VS Code.
- Ghost transcriptions ("two", "four") — These are filtered automatically. If ambient noise is high, adjust your microphone position.
- File not found — Include punctuation words when speaking filenames: "open auth dot controller dot ts". You can also say the name without an extension and Mantra will fuzzy-match it.
- Logs — Check View > Output > Mantra for detailed logs. The sidebar Activity Log also shows a history of all transcripts and actions.
Supported IDE Commands
Over 130 pre-mapped commands:
save, save all, new file, close file, close other files, close all files, reopen closed editor, undo, redo, cut, copy, paste, select all, toggle line comment, toggle block comment, format document, format selection, rename symbol, quick fix, organize imports, expand selection, shrink selection, select next occurrence, duplicate line down, duplicate line up, move line up, move line down, add cursor above, add cursor below, fold all, unfold all, toggle word wrap, find, replace, find in files, replace in files, next tab, previous tab, tab one through tab nine, page up, page down, go to definition, peek definition, go to references, go to implementation, jump to bracket, focus editor, focus sidebar, focus panel, toggle output, toggle sidebar, toggle panel, toggle zen mode, split editor, toggle minimap, zoom in, zoom out, reset zoom, toggle terminal, focus terminal, new terminal, next terminal, previous terminal, focus agent, new conversation, accept changes, reject changes, focus explorer, focus search, focus source control, focus debug, focus extensions, show command palette, quick open, toggle breakpoint, start debugging, stop debugging, continue debugging, step over, step into, step out, stage file, stage all, unstage file, commit, push, pull, checkout branch, show diff, stash, pop stash, toggle fullscreen, show problems, show notifications, clear notifications, reveal in finder, copy file path, copy relative path, markdown preview, run task, run build task, run test task, clear terminal, terminal scroll up, terminal scroll down.
Additional text operations (no LLM needed): go to line N, go to symbol by name, select/copy/cut/delete line N, select/copy/cut/delete lines A to B, scroll up/down, page up/down, new line above/below, indent, outdent, delete, paste, kill process, run last command.
WSL2 (Windows Subsystem for Linux) — Quick Setup
Mantra works in a Remote - WSL window. Use WSLg (audio bridge) and a Pulse-enabled FFmpeg.
Recommended (WSLg enabled)
- Windows PowerShell (Admin)
wsl --update
wsl --shutdown
- In WSL Ubuntu
sudo apt update && sudo apt install -y ffmpeg pulseaudio-utils
export MANTRA_FFMPEG_PATH=/usr/bin/ffmpeg
code .
- In VS Code: Command Palette > "Mantra: Select Microphone" > choose a device.
- Start recording.
Alternative (no WSLg)
Troubleshooting (WSL)
- "PulseAudio: Connection refused" — WSLg not active:
wsl --update then wsl --shutdown.
- "Unknown input format 'pulse'" — Wrong FFmpeg: install Ubuntu ffmpeg and set
MANTRA_FFMPEG_PATH=/usr/bin/ffmpeg.
- No mics in picker — Enable WSLg and relaunch VS Code from the WSL shell (
code .).
Persist FFmpeg path
echo 'export MANTRA_FFMPEG_PATH=/usr/bin/ffmpeg' >> ~/.bashrc
| |