Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Panda — Voice CompanionNew to Visual Studio Code? Get it now.
Panda — Voice Companion

Panda — Voice Companion

Venkatesh Annabathina

|
2 installs
| (1) | Free
Voice Companion VS Code Extension
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Project Panda

Animal Kingdom Series — Vol. 1

A 3D AI girl lives in your VS Code sidebar. She talks. She reacts. She remembers you.

VS Code Groq Three.js License


"Because coding alone at 2am shouldn't feel lonely."


What is this?

Project Panda is the first entry in the Animal Kingdom VS Code extension series.

It puts Yuriko — a sarcastic, emotionally reactive 3D AI companion — right inside your VS Code sidebar. Hold a button, speak to her, and she speaks back. Her face reacts in real time. She remembers things you tell her across sessions. She gets annoyed. She gets happy. She judges your code (lovingly).

This is not a chatbot widget. It is a full voice pipeline with a living 3D VRM avatar whose expressions, lip sync, blink, and gaze are all driven in real time.


The Pipeline

Your Voice  (mic button held)
     |
  SoX binary  →  16kHz mono WAV on disk  (node-record-lpcm16)
     |
  Whisper STT  →  transcript text  (Groq: whisper-large-v3-turbo)
     |
  Compressed memory injected into system prompt
     |
  LLM  →  streamed reply + [emotion:X] tag  (Groq: configurable model)
     |
  Emotion tag parsed  →  avatar expression driven live
     |
  Background memory compression  →  key:value tokens saved to disk
     |
  Orpheus TTS  →  WAV audio buffer  (Groq: canopylabs/orpheus-v1-english)
     |
  Web Audio API  →  decoded + played back in webview
     |
  RMS amplitude  →  live lip sync on avatar

Text input bypasses STT and feeds directly into the LLM step.


Features

  • Push-to-talk mic input — hold the mic button, release to process
  • Text input — type instead of speaking anytime
  • Configurable LLM — Llama 3.3 70B (default), Llama 3.1 8B, or Mixtral 8x7B
  • Orpheus TTS — expressive, natural-sounding voice (5 voices selectable)
  • 3D VRM avatar — full Three.js scene inside the sidebar canvas
  • 13-emotion system — LLM tags its own reply, avatar reacts immediately
  • Lip sync — RMS amplitude from Web Audio drives mouth phonemes in real time
  • Auto-blink — randomised blink timing for a natural feel
  • Gaze system — eye target shifts per conversation state
  • Micro-expressions — brief high-intensity flickers layered on top of base expressions
  • Idle body motion — subtle breathing and head sway after the intro animation finishes
  • Compressed persistent memory — facts extracted and stored as key:value tokens across sessions, injected into every system prompt
  • Conversation log — rolling 60-entry localStorage log, surfaced in Settings
  • Secure API key storage — VS Code SecretStorage, never in settings or plaintext
  • Theme-aware UI — CSS uses --vscode-* variables throughout, works in any theme
  • Onboarding flow — animated splash → tagline → companion selection on first launch
  • Settings panel — full-height slide-in overlay with 6 accordion sections
  • Settings sync — voice, model, companion name changes are pushed to the Extension Host live
  • Asset caching — VRM/VRMA models downloaded from GitHub Releases on first launch and cached permanently; zero re-download on subsequent launches

Meet Yuriko

Yuriko is the personality layer. She is:

  • Sarcastic but caring
  • Expressive — her avatar reacts emotionally to what she says
  • Opinionated — max 2 sentences, no markdown, no fluff, plain spoken words only
  • Reactive — she uses [playful] and [whisper] inline for delivery variation
  • Remembers you — compressed facts from past conversations are silently injected into her context

Every reply ends with one emotion tag (e.g. [emotion:joy]). The tag is stripped before TTS so she sounds natural, but her avatar reacts to it immediately.


Requirements

1. SoX — Audio Capture Engine

node-record-lpcm16 shells out to the sox / rec binary for mic recording.

Platform Install
macOS brew install sox
Linux sudo apt install sox
Windows sox.sourceforge.net

On macOS the extension auto-injects /opt/homebrew/bin and /usr/local/bin into PATH so VS Code can find the binary even when launched from the app icon.

2. Groq API Key

Free at console.groq.com. Paste it on first launch — stored in VS Code SecretStorage and never written to disk or settings.

3. Accept Orpheus TTS Terms

One-time step required before TTS works: Accept Orpheus Terms


Setup

# Install dependencies
npm install

# Build the VRM scene bundle (Three.js + @pixiv/three-vrm → single IIFE)
npm run bundle

# Compile TypeScript
npm run compile

# Or watch mode during development
npm run watch

Press F5 in VS Code to launch the Extension Development Host.

Important: any time you edit media/vrm-scene-src.js, you must re-run npm run bundle — the webview loads media/vrm-bundle.js, not the source file directly. If window.YurikoVRM is undefined at runtime, the bundle is stale.


Build & Package

# Bundle VRM scene only
npm run bundle

# TypeScript only
npm run compile

# Full production build (bundle + compile)
npm run vscode:prepublish

# Package as .vsix for distribution
npm run package

Architecture

Extension Host (Node.js)              Webview (HTML/JS sandbox)
──────────────────────────            ──────────────────────────
src/extension.ts                      webview/index.html
src/panel.ts          ←─ postMsg ─→   media/main.js
src/groqClient.ts                      media/vrm-bundle.js   ← esbuild IIFE
src/audioCapture.ts                    media/style.css
src/secretManager.ts
src/memoryManager.ts

All mic I/O runs in the Extension Host (Node.js). getUserMedia and Web Speech API do not work inside VS Code webviews. The webview handles rendering, UI state, Web Audio playback, and the Three.js VRM scene only.

Communication is entirely via postMessage — the Extension Host and Webview are isolated and can only exchange serialisable JSON messages.


Asset Loading

VRM models and VRMA animations are not bundled in the extension. On first launch they are downloaded from GitHub Releases into context.globalStorageUri (VS Code's per-extension persistent storage directory) and cached permanently. Subsequent launches serve the cached files as local vscode-resource:// URIs — zero network traffic after the first run.

The Extension Host handles all downloads using Node's https module with redirect-following. The webview never fetches from external URLs — avoiding CORS restrictions entirely.

Assets source: https://github.com/venkateshannabathina/project-panda/releases/download/v0/


UI Flow

First Launch (onboarding)

Splash screen  →  (2.2s auto-advance)
     ↓
"made for developers" tagline  →  (2.2s auto-advance)
     ↓
Companion selection  →  (user picks a card)
     ↓
Main shell built  →  WEBVIEW_READY + syncSettings() sent  →  checkInitialKey()
     ↓
  [no key]  →  API key overlay shown
  [key exists]  →  LOADING overlay  →  Groq init + memory loaded  →  VOICE_UI

prefs.firstTimeDone is written to localStorage when the user picks a companion. On subsequent launches, buildShell() is called directly, skipping onboarding entirely.

Main Shell Layout

┌─────────────────────────────┐
│  VRM viewport (flex:1)      │  ← Three.js canvas fills this
│                             │
│  [settings ⚙]  top-right   │  ← 32px circular button
│                             │
│  [toast overlays]           │  ← user/yuriko speech bubbles
└─────────────────────────────┘
│  input-pill                 │  ← [🎤] [text input] [↑]
└─────────────────────────────┘

Overlays (API key card, loading spinner) sit above the viewport in the same stacking context. The shell DOM is built once and never torn down — overlays are toggled with display:none/flex.

Settings Panel

Right-side full-height slide-in panel. Six accordion sections:

Section Controls
Companion Rename companion, personality dropdown (Friendly / Professional / Casual / Sarcastic), change companion button
Memory Enable/disable toggle, last 8 conversation lines preview, clear button (wipes both localStorage log and compressed memory file)
Voice Enable/disable toggle, speed slider (0.5×–2×), voice dropdown (Diana, Tara, Leah, Jess, Zac)
Appearance Theme chips (VS Code / Light / Dark), character size chips (S / M / L), background color swatches + custom color picker
API / Account API key input + save, model dropdown (Llama 3.3 70B / Llama 3.1 8B / Mixtral 8x7B), clear key button
About Version, Orpheus TTS terms link, Groq console link

All preferences persist to localStorage under panda_* keys and are read back on every launch. Settings that affect the Extension Host (voice name, model, companion name) are synced via UPDATE_SETTINGS postMessage on load and whenever they change.


Memory System

Panda has two complementary memory layers:

Layer 1 — Conversation Log (localStorage)

A rolling JSON array stored in panda_memory. Each entry is { role, text, t }.

  • Max 60 entries — oldest dropped when limit is reached
  • Both USER_SAID and YURIKO_SAID messages trigger memAdd()
  • Settings → Memory shows the last 8 exchanges as a live preview
  • Only written when prefs.enableMemory is true

Layer 2 — Compressed Persistent Memory (disk)

After every conversation turn, a background LLM call (llama-3.1-8b-instant) extracts important facts and merges them into a compressed token string stored in yuriko_memory.json inside globalStorageUri.

Format: name:venky|wake:930|school:daily|home:5pm|music:rap

  • Pipe-separated key:value pairs, max 120 characters
  • New facts are merged in; existing keys are updated not duplicated
  • Loaded on every init and injected into Yuriko's system prompt so she knows who you are before you say a word
  • She uses memory naturally — never recites it verbatim
  • Cleared when the user clicks "clear memory" in Settings (wipes both layers)

Source Files

File What it does
src/extension.ts Entry point — registers PandaPanel as a sidebar WebviewViewProvider and the panda.start command
src/panel.ts Main orchestrator — routes all postMessages, manages STT → LLM → TTS pipeline, owns isBusy flag, downloads and caches VRM assets
src/groqClient.ts All Groq API calls: Whisper transcription, LLM streaming, Orpheus TTS synthesis, emotion tag parsing, memory compression
src/audioCapture.ts Mic recording via node-record-lpcm16 → temp WAV file in os.tmpdir()
src/secretManager.ts Thin wrapper around vscode.SecretStorage for the Groq API key
src/memoryManager.ts Reads/writes yuriko_memory.json in globalStorageUri — persistent compressed memory across sessions
media/main.js Webview JS — onboarding flow, shell DOM, settings panel, preferences, conversation log, VRM init, audio playback + RMS lip sync
media/vrm-scene-src.js Three.js + @pixiv/three-vrm scene source — VRM loading, 5-layer expression engine, micro-expressions, blink, gaze, idle body motion, VRMA animation
media/vrm-bundle.js esbuild IIFE output of vrm-scene-src.js — what the webview actually loads. Exposes window.YurikoVRM
media/style.css All webview styles — CSS custom properties, theme overrides, onboarding animations, companion cards, settings accordion
webview/index.html HTML shell — CSP with nonce injection, loads vrm-bundle.js then main.js

postMessage Protocol

Direction Message type Payload What it does
Webview → Host WEBVIEW_READY — Shell is built and ready; triggers checkInitialKey()
Webview → Host SAVE_API_KEY { key } Save API key to SecretStorage and reconnect
Webview → Host CLEAR_API_KEY — Wipe key from SecretStorage, null client, show API_KEY screen
Webview → Host REQUEST_VRM { companion } Download (if needed) and serve VRM + VRMA URIs for the companion
Webview → Host START_LISTENING — Begin mic recording
Webview → Host STOP_LISTENING — Stop recording, kick off STT → LLM → TTS
Webview → Host SEND_TEXT { text } Send typed text directly to LLM
Webview → Host TTS_DONE — Audio playback finished, release isBusy
Webview → Host UPDATE_SETTINGS { voiceName, model, companionName } Push current preferences to Extension Host — sent on load and on every relevant settings change
Webview → Host CLEAR_MEMORY — Wipe compressed memory file and reset in-memory state
Host → Webview SHOW_SCREEN { screen } Navigate to API_KEY, LOADING, or VOICE_UI
Host → Webview SHOW_ERROR { message } Show error toast
Host → Webview LOAD_VRM { vrmUri, vrmaUri, animations } Local webview-safe URIs for VRM model, intro animation, and all named animations
Host → Webview SET_STATE { state } Drive UI + avatar state: idle, listening, processing, speaking, error
Host → Webview USER_SAID { text } Show user's transcript as toast + write to conversation log
Host → Webview LLM_WORD_CHUNK { word } Individual streamed word (reserved for future streaming UI)
Host → Webview LLM_DONE — Full LLM response is complete
Host → Webview YURIKO_SAID { text, emotion } Show Yuriko's reply as toast, write to log, drive avatar emotion
Host → Webview PLAY_AUDIO { audioBase64, mimeType } Base64 WAV to decode and play; respects voiceEnabled and voiceSpeed prefs
Host → Webview ERROR { message } Inline error shown as system toast

Preferences System

All user preferences live in localStorage under panda_* keys. The prefs object in media/main.js provides typed getters/setters that write through immediately.

Key Default What it controls
panda_ftd '0' First-time done flag (skips onboarding after first companion pick)
panda_companion 'yuriko' Active companion id
panda_cname 'Yuriko' Display name — synced to Extension Host via UPDATE_SETTINGS
panda_personality 'friendly' Personality tone (UI only — future LLM prompt wiring)
panda_mem_on '1' Memory enabled toggle
panda_voice_on '1' TTS playback toggle
panda_vspeed '1.0' Playback rate for Web Audio (0.5–2)
panda_vname 'diana' Orpheus voice name — synced to Extension Host via UPDATE_SETTINGS
panda_theme 'vscode' Theme: vscode, light, or dark
panda_csize 'medium' Character size: small, medium, or large
panda_bg '' Custom viewport background color
panda_model 'llama-3.3-70b-versatile' LLM model — synced to Extension Host via UPDATE_SETTINGS
panda_memory '[]' Rolling 60-entry conversation log (JSON array)

Emotion System

How it works

  1. The LLM system prompt instructs Yuriko to end every reply with exactly one [emotion:X] tag.
  2. groqClient.ts parses the tag out of the full streamed response with a regex.
  3. The clean text (tag stripped) goes to TTS. The emotion name goes to the webview as part of YURIKO_SAID.
  4. If the LLM omits the tag, main.js runs analyzeSentiment() — a keyword regex fallback — over the reply text.
  5. The webview calls window.YurikoVRM.setSentiment(emotionName) which blends the avatar's expressions toward that emotion's profile.

Available emotions

Tag Face it drives Typical trigger
joy Big open smile (happy 0.9) laughing, loving something
excited Wide eyes + huge smile wow, can't believe it
fun Smirk / soft smile (relaxed 0.92) goofing, jokes
smirk Sly self-satisfied look stating the obvious, smug
suspicious Narrowed brow (angry 0.62) judging, not buying it
teasing Smirk + hint of surprise playful jab, banter
confident Composed smirk assertive, matter-of-fact
angry Furrowed brow (angry 0.9) frustrated, mad
sad Down-turned mouth (sad 0.82) genuine sadness
apologetic Sad + touch of surprise sorry, can't do it
empathetic Soft sadness + warmth understanding pain
calm Relaxed, composed informational, explaining
question Wide eyes, slightly happy curious, wondering

VRM Expression Engine (5 Layers)

All expression blending happens in media/vrm-scene-src.js at ~60fps.

Layer 1 — State Profile (ambient baseline)

A moderate baseline per conversation phase (idle, listening, processing, speaking, error). Each state has base slider values, oscillation frequencies, amplitude, and blend speed.

Layer 2 — Emotion Profile (dominant)

When setSentiment() is called, _emotionBlend ramps from 0 → 1 over ~300ms (rate: 3.5/s). The result is a lerp from the state profile toward the emotion profile. Fades back at 1.5/s when cleared.

Layer 3 — Organic Oscillation

A two-frequency sine oscillation multiplied over both profiles so the face breathes and feels alive rather than locked.

Layer 4 — Micro-Expressions

Brief high-intensity flickers (22ms–600ms) layered additively, capped at 1.0. Each micro has states, optional emotions gate, gap range, and fadeIn / fadeOut / dur timings.

Layer 5 — Lip Sync Mouth Isolation

While phonemes (aa, ee, ih, oh, ou) are active, mouth-affecting shapes are faded to 50% so the emotion still shows in the eyes and brows but phonemes own the jaw.

Blink

Randomised two-phase blink (close / open). Next blink fires 2.5–7.5s after the last. Speed randomised per blink (50–90ms per phase).

Gaze

Per-state eye target positions smoothed with lerp at 2.5/s. In idle and speaking states the target drifts on a slow sine to simulate natural eye movement.

Idle Body Motion

After the intro VRMA animation finishes, procedural motion drives head, neck, chest, and spine bones with sine waves at different frequencies for a breathing/swaying feel.


Models

Task Model Notes
Speech-to-Text whisper-large-v3-turbo English only
Language Model Configurable (default: llama-3.3-70b-versatile) Streamed, max 150 tokens, 2-sentence replies enforced
Text-to-Speech canopylabs/orpheus-v1-english Voice configurable (default: diana), WAV output
Memory Compression llama-3.1-8b-instant Background call after each turn, max 80 tokens

Selectable LLM models in Settings → API/Account:

Option Model ID
Llama 3.3 70B (default) llama-3.3-70b-versatile
Llama 3.1 8B (fast) llama-3.1-8b-instant
Mixtral 8x7B mixtral-8x7b-32768

Security

  • API key stored via vscode.SecretStorage under panda.groqKey. Never written to disk, settings, or environment variables.
  • CSP set on the webview HTML via webview.cspSource and a per-session nonce. Scripts only execute with the correct nonce.
  • localResourceRoots explicitly allows only media/, webview/, and globalStorageUri (asset cache) — the webview cannot access anything else on disk.
  • No external fetches from webview — all asset downloads happen in the Extension Host (Node.js), served to the webview as local vscode-resource:// URIs.
  • Extension Host isolation — all Groq API calls and mic access happen in Node.js, fully isolated from the webview sandbox.
  • API key validation — webview validates keys start with gsk_ before sending; a bad key clears itself from SecretStorage on failed init.

File Structure

project-panda/
├── src/
│   ├── extension.ts          # VS Code entry point
│   ├── panel.ts              # Main orchestrator + message router + asset downloader
│   ├── groqClient.ts         # Groq API: STT + LLM + TTS + emotion parsing + memory compression
│   ├── audioCapture.ts       # Mic recording via SoX → temp WAV
│   ├── secretManager.ts      # VS Code SecretStorage wrapper
│   ├── memoryManager.ts      # Persistent compressed memory (globalStorageUri)
│   └── node-record-lpcm16.d.ts
├── media/
│   ├── vrm-scene-src.js      # Three.js + VRM scene (source — edit this)
│   ├── vrm-bundle.js         # esbuild IIFE output (rebuild after edits to src above)
│   ├── main.js               # Webview UI: onboarding, shell, settings, audio, memory
│   ├── style.css             # VS Code theme-aware styles
│   └── panda-icon.svg        # Activity bar icon
├── webview/
│   └── index.html            # HTML shell with CSP nonce injection
├── out/                      # tsc output (gitignored)
├── package.json
├── tsconfig.json
└── LICENSE

VRM/VRMA assets are not in this repo. They are downloaded on first launch from: https://github.com/venkateshannabathina/project-panda/releases/tag/v0


Known Gotchas

  • First launch downloads assets. On first run the extension downloads all VRM/VRMA files from GitHub Releases (~50MB total). This takes a few seconds depending on connection speed. Subsequent launches are instant — files are cached in globalStorageUri.
  • Rebuild the bundle after editing the VRM scene. The webview loads media/vrm-bundle.js (esbuild output). Editing media/vrm-scene-src.js has no effect until you run npm run bundle. If window.YurikoVRM is undefined at runtime, the bundle is stale.
  • SoX must be on PATH. If VS Code is launched from the app icon on macOS, it may not inherit your shell PATH. The extension injects /opt/homebrew/bin and /usr/local/bin automatically, but if SoX is installed elsewhere mic input will silently fail.
  • Orpheus terms must be accepted once. If TTS returns a 400 with a terms/consent message, the extension surfaces: "Accept Orpheus terms at console.groq.com first."
  • Rate limits. Groq free tier has rate limits. If a 429 is hit mid-pipeline the error shows as "Rate limit hit, please wait a moment."
  • Memory compression fires a background LLM call after every turn (using llama-3.1-8b-instant). This counts against your Groq rate limits but is non-blocking — it never delays the conversation.
  • retainContextWhenHidden: true is set on the webview — the VRM scene and Web Audio context persist when the sidebar is hidden, avoiding a re-init cycle each time the panel is toggled.
  • WEBVIEW_READY timing. checkInitialKey() is only triggered by the WEBVIEW_READY message (sent from buildShell() once the DOM is ready), not from resolveWebviewView(). This prevents a race where the host checks SecretStorage before the webview JS has run.
  • Screen queue. If the host sends a SHOW_SCREEN message while onboarding is still running (before buildShell() completes), it is stored in queuedScreen and applied the moment the shell is ready.

Animal Kingdom Series

# Project Status
Vol. 1 Panda — Voice AI Companion Active
Vol. 2 Coming soon... Locked
Vol. 3 Coming soon... Locked

Built by Venkatesh Annabathina

Part of the Animal Kingdom VS Code Extension Series

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft