Cloud TTS for VSCode

Right-click selected text (in an editor or in the integrated terminal) → "Read Aloud" → audio plays on macOS, Windows, and Linux.

Starts free with your System Voice (no API key, offline). For human-sounding speech, plug in Gemini, OpenAI, or ElevenLabs — pick whichever you like the sound of best.

Voices

System Voice (default) — free, no key, offline. Uses your OS built-in synthesizer: macOS say, Windows SAPI, Linux espeak-ng/espeak/spd-say. Robotic, but costs nothing and works everywhere.

The cloud providers below sound human (the OS voices are robotic):

Gemini — natural prosody, supports natural-language style prompts ("Say sarcastically:") and audio tags ([whispers], [shouting])
- gemini-2.5-flash-preview-tts (default) — faster, cheaper. ~$10/1M chars
- gemini-2.5-pro-preview-tts — better prosody. ~$80/1M chars (≈8× flash)
- gemini-3.1-flash-tts-preview — newest (Apr 2026), best controllability and expressivity, 100+ languages. ~$12/1M chars
OpenAI — fast, cheap, decent quality
- gpt-4o-mini-tts (default) — newer, supports instructions. ~$0.015/min audio
- tts-1 — cheap and fast, no instructions. $15/1M chars
- tts-1-hd — higher-quality variant of tts-1. $30/1M chars
ElevenLabs — top-tier voice cloning quality (most expensive)
- eleven_multilingual_v2 (default) — highest quality multilingual. ~$120/1M chars (API)
- eleven_turbo_v2_5 — lower latency, slightly less expressive. ~$60/1M chars (API)
- eleven_flash_v2_5 — lowest latency, cheapest. ~$60/1M chars (API)

Prices are approximate ballparks for rough comparison — see each provider's pricing page for exact rates and any free-tier quotas.

Install

cd ~/Documents/vsc-extensions/cloud-tts
npx --yes @vscode/vsce package --allow-missing-repository
code           --install-extension cloud-tts-*.vsix   # stable VSCode
code-insiders  --install-extension cloud-tts-*.vsix   # Insiders (if you use it)

Re-run after edits to update. Reload windows with Cmd+Shift+P → Developer: Reload Window.

Platform notes

macOS / Windows — System Voice and audio playback work out of the box (say / built-in SAPI; afplay / MediaPlayer).
Linux — System Voice needs espeak-ng (or espeak/spd-say); cloud-audio playback needs ffplay (ffmpeg) or mpg123. Install via your package manager, e.g. sudo apt install espeak-ng ffmpeg.

First-time setup

Nothing to configure — System Voice is the default and needs no key. To upgrade to a human-sounding cloud voice:

Set an API key: Cmd+Shift+P → Cloud TTS: Set API Key… → pick provider → paste key. The input field is masked, and the key is stored in VSCode's encrypted SecretStorage — never written to settings.json, never synced. (System Voice is keyless, so it isn't listed here.)
Pick a provider: Cmd+Shift+P → Cloud TTS: Switch Provider, or set cloudTts.provider in Settings.
(Optional) tune voice/model: Cmd+Shift+P → Cloud TTS: Open Settings.

Provider	Get a key at
Gemini	https://aistudio.google.com/apikey
OpenAI	https://platform.openai.com/api-keys
ElevenLabs	https://elevenlabs.io/app/settings/api-keys

Usage

No default keybindings — every action runs via the right-click menu (for selections) or the Command Palette (Cmd+Shift+P). Bind your own shortcuts in Code → Settings → Keyboard Shortcuts if you want them.

Action	How
Read selection (editor)	Right-click → Read Aloud
Read selection (terminal)	Right-click → Read Aloud
Stop playback	`Cloud TTS: Stop Playback` (Command Palette)
Quick switch provider	`Cloud TTS: Switch Provider`
Set / change API key	`Cloud TTS: Set API Key…` (masked input)
Delete all stored keys	`Cloud TTS: Clear All API Keys`
Open settings	`Cloud TTS: Open Settings`
Cancel during synthesis	Cancel button on the progress notification

Terminal selection is read via a brief clipboard round-trip (the only way without an official API). The clipboard is restored to its previous content immediately after.

Settings reference

Top-level

Setting	Default	Notes
`cloudTts.provider`	`gemini`	Active provider

API keys are not in Settings — use Cloud TTS: Set API Key… instead (encrypted).

Gemini

Setting	Default	Notes
`cloudTts.gemini.model`	`gemini-2.5-flash-preview-tts`	Or `gemini-2.5-pro-preview-tts` for better prosody, `gemini-3.1-flash-tts-preview` for the newest (Apr 2026)
`cloudTts.gemini.voice`	`Kore`	30 voices: Zephyr, Puck, Charon, Kore, Fenrir, Leda, Orus, Aoede, …
`cloudTts.gemini.stylePrompt`	(empty)	Prepended to text. e.g. `"Say in a calm, neutral tone:"`

OpenAI

Setting	Default	Notes
`cloudTts.openai.model`	`gpt-4o-mini-tts`	Or `tts-1`, `tts-1-hd`
`cloudTts.openai.voice`	`nova`	`alloy`, `ash`, `ballad`, `cedar`, `coral`, `echo`, `fable`, `marin`, `onyx`, `nova`, `sage`, `shimmer`, `verse`
`cloudTts.openai.instructions`	(empty)	gpt-4o-mini-tts only. e.g. `"Speak cheerfully."`

ash / ballad / cedar / coral / marin / sage / verse only work with gpt-4o-mini-tts.

ElevenLabs

Setting	Default	Notes
`cloudTts.elevenlabs.model`	`eleven_multilingual_v2`	Or `eleven_turbo_v2_5`, `eleven_flash_v2_5`
`cloudTts.elevenlabs.voiceId`	`21m00Tcm4TlvDq8ikWAM` (Rachel)	Browse https://elevenlabs.io/app/voice-library for IDs

Limitations

Linux needs a TTS engine / audio player installed (see Platform notes); macOS and Windows work out of the box.
No streaming — waits for full clip before playback. Long selections take a few seconds before audio starts. (System Voice starts speaking immediately.)
Terminal selection requires a clipboard round-trip (clipboard is restored after).

Releasing (maintainer)

.github/workflows/publish.yml auto-publishes to both the VS Code Marketplace and the Open VSX Registry whenever package.json#version changes on main. To cut a release:

Bump version in package.json (update README if needed).
Commit + push to main (or merge a PR).
Watch the run at Actions → Publish to Marketplace & Open VSX.

The two registries publish independently — if one fails, the other still goes out.

Pushes that don't touch version no-op silently — safe to edit README, code, etc. on main without triggering a publish.

One-time setup: `VSCE_PAT` secret

The workflow needs an Azure DevOps PAT stored as a GitHub repo secret.

Create the PAT at https://dev.azure.com → user icon → Personal access tokens → New Token.
- Organization: All accessible organizations
- Scopes: Custom defined → check Marketplace: Manage
Add it to GitHub: Repo Settings → Secrets and variables → Actions → New repository secret.
- Name: VSCE_PAT
- Value: the token from step 1
(Optional) verify locally: VSCE_PAT=<token> npx @vscode/vsce verify-pat geryit.

One-time setup: `OVSX_PAT` secret

The workflow also needs an Open VSX access token.

Sign the Open VSX Publisher Agreement (one-time), then create a token at https://open-vsx.org/user-settings/tokens.
Add it to GitHub: Repo Settings → Secrets and variables → Actions → New repository secret.
- Name: OVSX_PAT
- Value: the token from step 1
(Optional) publish locally: OVSX_PAT=<token> npx ovsx publish.

Verified-publisher warning: Open VSX shows a "not a verified publisher of the namespace" notice until the geryit namespace is owned. Request ownership via an issue at EclipseFdn/open-vsx.org (one-time, after signing the agreement).

Cloud TTS (Gemini / OpenAI / ElevenLabs)

geryit