Markdown TTS

A VS Code text-to-speech extension that reads markdown, code, and docs aloud — or uses AI to explain and narrate them so you can listen instead of read. Speak into Copilot Chat with OS-native dictation, and hear your git changes summarized by ear. Core reading works offline with no API keys on Windows (SAPI) and macOS (say), plus optional Microsoft Edge neural voices. AI narration is bring-your-own-key — use OpenAI, Anthropic, or a free Groq key.

Requirements

All platforms

VS Code 1.80.0 or later
Internet connection — only if using Edge Neural TTS engine

Windows (Voice Input)

Windows 10/11 with Voice Typing enabled
Enable: Settings → Privacy & Security → Speech → Online speech recognition → On
Test manually: press Win+H anywhere — the Voice Typing toolbar should appear

macOS (Voice Input)

macOS with Dictation enabled
Enable: System Settings → Keyboard → Dictation → On
First run: macOS will prompt for Accessibility permission for VS Code — grant it

Linux

Edge Neural TTS engine works (requires internet)
Local TTS and Voice Input are not yet supported

Install

Search "Markdown TTS" in the VS Code Extensions panel, or:

code --install-extension AbhishekShr.markdown-tts

Quick Start

Open any markdown file
Click the ▶ Read File button in the editor title bar — the file is read aloud (pause/resume and stop buttons appear there while reading)
Or open the Command Palette (Ctrl+Shift+P) and run any "Markdown TTS" command
For voice input: run Markdown TTS: Voice Input to Chat → speak into Copilot Chat

That's it. For Edge neural voices, change markdownTts.engine to "edge" in Settings. Prefer keyboard shortcuts? See Running Commands to bind your own.

Features

Text-to-Speech

Read File Aloud — speak the entire file, or from cursor position if placed mid-file
Read from Cursor — explicitly read from cursor to end of file
Read Selection Aloud — speak only highlighted text
Read Clipboard Aloud — speak copied text (great for chat responses, browser content, etc.)
Heading Navigation — skip to next/previous heading while reading
Pause / Resume — temporarily pause and continue speech
Stop Reading — stop the current speech immediately
Edge Neural TTS — optional high-quality voices via Microsoft Edge's online service. Long files stream chunk-by-chunk so audio starts in seconds, not after a 2-minute wait
Live chunk progress — status bar shows Reading 1.2x 3/10 while streaming, plus a Buffering N/M… spinner when waiting on the next chunk
Click-to-change reading speed — clickable ↑ / ↓ status bar buttons adjust speed live, with the new rate applied to upcoming chunks
Export to Audio File — save any markdown file or selection as MP3 using Edge TTS
List Edge Voices — browse 400+ voices in a searchable picker
Reading time estimate — shows word count and estimated duration before reading long files

AI Narration (Bring Your Own Key)

Explain This File Aloud (✨ button in the editor title bar) — instead of reading the file verbatim, an LLM explains what it does in plain English, then narrates the explanation. Great for unfamiliar code or dense docs.
Explain Selection Aloud — right-click a highlighted block to explain just that part (falls back to the whole file if nothing is selected).
Narrate Git Changes Aloud (git-compare button in the Source Control title bar) — runs git diff locally and speaks an AI summary of your uncommitted changes, last commit, or branch vs main. Perfect for a pre-commit review by ear.
Bring your own key — works with OpenAI, Anthropic (Claude), or Groq (which has a free tier — no credit card). Your key is stored in VS Code's encrypted SecretStorage, never in settings. Set it with Markdown TTS: Set AI API Key.
Read-along output — the generated explanation is also written to the Markdown TTS – AI output panel so you can follow the text while it speaks.
✨ Sparkle button in the editor title bar for one-click file explanation.

Voice Input (Speech-to-Text)

Voice Input to Chat — triggers OS-native dictation directly into Copilot Chat
Run Markdown TTS: Voice Input to Chat → Copilot Chat opens → dictation starts → speak → text streams into chat
Windows: launches Win+H Voice Typing (press Esc to stop)
macOS: triggers Edit → Start Dictation (press Fn to stop)
Pre-warmed PowerShell process for near-instant launch on Windows

Smart Markdown Processing

Strips YAML frontmatter, HTML tags, code blocks, footnotes
Converts headings to "Heading level N: ..."
Reads image alt text as "Image: alt text"
Tables read as comma-separated values (wide tables truncated to 5 columns)
Removes bold/italic/link syntax, keeps the text
Pronunciation dictionary — custom replacements for tech terms TTS engines mispronounce
Works in markdown preview mode and with unsaved/untitled files

Running Commands

Every command is available from several places — pick whichever is fastest:

Command Palette — Ctrl+Shift+P (Cmd+Shift+P on macOS) → type "Markdown TTS" to see them all.
Editor title bar buttons — ▶ Read File, ✨ Explain File, 📋 Read Clipboard, plus pause/resume/stop and heading navigation while reading.
Right-click menu — Read Selection, Explain Selection, Read/Explain File, Export to Audio, and more.
Source Control title bar — the git-compare button runs Narrate Git Changes Aloud.
Status bar — while reading, click the ↑ / ↓ buttons to change speed live, or the reading indicator to pause/resume.

Want keyboard shortcuts?

This extension ships no default keyboard shortcuts on purpose: the natural choice, Ctrl+Alt+<letter>, collides with AltGr on many non-US keyboard layouts (Indian, German, French, Spanish, Nordic, and others), where the OS types an accented character instead of running the command. Rather than hijack keys that break for a large share of users, you bind exactly the ones you want:

Open Keyboard Shortcuts: Ctrl+K Ctrl+S (Cmd+K Cmd+S on macOS).
Search for "Markdown TTS".
Click the + next to a command and press your preferred key combo.

For layout-safe shortcuts, Ctrl+K chords (e.g. Ctrl+K R) or plain Ctrl+Shift+<letter> combos avoid the AltGr problem.

Settings

{
  "markdownTts.engine": "sapi",
  "markdownTts.rate": 2,
  "markdownTts.voice": "",
  "markdownTts.edgeVoice": "en-US-AriaNeural",
  "markdownTts.replacements": null,
  "markdownTts.ai.provider": "openai",
  "markdownTts.ai.model": ""
}

engine — "sapi" (local system voice, offline, default) or "edge" (Microsoft Edge neural voices, online)
rate — Speech rate: -10 to 10. Default 2. Maps to SAPI rate on Windows, WPM on macOS, percentage on Edge.
voice — Voice name. Windows: SAPI voice (e.g. "Microsoft David Desktop"). macOS: say voice (e.g. "Samantha"). Leave blank for system default.
edgeVoice — Edge voice name. Default "en-US-AriaNeural". Run Markdown TTS: List Edge Voices to browse.
replacements — Custom pronunciation dictionary. Object mapping text → replacement (e.g. {"npm": "en pee em"}). Applied before speaking.
ai.provider — AI narration provider: "openai", "anthropic", or "groq". Default "openai".
ai.model — Override the model. Leave blank for a fast, low-cost default (gpt-4o-mini, claude-haiku-4-5, or llama-3.1-8b-instant for Groq).

AI Narration Setup (Optional)

The Explain and Narrate commands need an API key — your own, so there's no subscription. The fastest free option is Groq:

Get a free key at console.groq.com/keys (no credit card).
In Settings, set markdownTts.ai.provider to "groq".
Run Markdown TTS: Set AI API Key (Command Palette) and paste the key. It's stored in VS Code SecretStorage — never in your settings file.
Open any file and click the ✨ button in the editor title bar (or run Markdown TTS: Explain This File Aloud from the Command Palette).

Prefer OpenAI or Anthropic? Set the provider accordingly and paste that key instead. Only the Explain/Narrate commands ever contact the AI provider — all other reading stays local/offline.

Platform Support

Feature	Windows	macOS	Linux
Local TTS engine	SAPI via PowerShell	`say` command	—
Edge Neural TTS	✅	✅	✅
Pause / Resume	✅	✅	✅
Voice Input	Win+H Voice Typing	Edit → Start Dictation	—
Heading Navigation	✅	✅	✅
Reading Time Estimate	✅	✅	✅
Audio Export (MP3)	✅ (Edge)	✅ (Edge)	✅ (Edge)

Architecture

extension.js — single-file entry point (~1100 lines)
Windows SAPI: spawns PowerShell with System.Speech.Synthesis.SpeechSynthesizer
macOS say: spawns say command, pause/resume via Unix signals
Edge engine: splits text into ~1.5 KB chunks; prefetches 2 chunks ahead via parallel WebSockets; plays chunks sequentially through platform-native player (afplay on macOS, WPF MediaPlayer on Windows). 30 s per-chunk timeout with one retry.
Voice Input: pre-warmed PowerShell sends Win+H via keybd_event (Windows) or AppleScript triggers Dictation menu (macOS)
Pause/resume/stop: control file on Windows, signals (SIGTSTP/SIGCONT) on macOS
Title bar buttons and status bar indicator for visual control

Troubleshooting

Problem	Solution
Voice Input: nothing happens	Enable OS dictation (see Requirements above)
Edge TTS: "Timed out"	A single chunk failed twice; check internet. Long files split into ~1.5 KB chunks (30 s timeout each) so total reads no longer block on a 2 min ceiling.
Edge TTS: "Cannot find module"	Reinstall VSIX — dependencies may not have been bundled
No sound on Windows	Check default audio output device; try SAPI engine
No sound on macOS	Try `say "hello"` in Terminal; check volume

Limitations

Linux: only Edge engine supported (no local TTS, no voice input)
Edge engine requires internet — text is sent to Microsoft servers
Voice Input requires OS-level dictation to be enabled
Markdown stripping is regex-based, not a full parser
Mid-stream speed change has ~1 min delay (Edge engine). The currently-playing chunk continues at its original rate; the new rate kicks in when the next chunk starts. With ~1.5 KB chunks (about 60–90 s of audio), expect up to a minute before the change is audible. SAPI/macOS engines apply the new rate only on the next read.

Markdown TTS: Read Aloud & AI Narration

Abhishek Shr