Owlie Studio

AI-powered video editor for VS Code — you direct, it edits, and nothing is ever lost.

Owlie Studio flips video editing on its head: instead of doing the work by hand, you tell an AI assistant what you want — trim this, add captions, reorder these scenes — and it makes the edits for you. You stay in control: review each change and keep it, undo it, or try a different version to explore an idea. The AI does the hands-on work; you decide what makes the final cut. It runs on your own AI subscription, so you bring the assistant you already use.

Watch on YouTube

New to Owlie Studio? Watch the walkthroughs, tips, and release demos on our channel — click the logo to open it on YouTube.

▶️ Watch Owlie Studio on YouTube »

Key Features

Plain-text timeline as the single source of truth

Your project is an OpenTimelineIO (OTIO) JSON file on disk — the SMPTE/Pixar open exchange format — not a vendor-locked binary (.prproj, .fcpbundle, .drp). It can be diffed, merged, branched, and searched with ordinary git and your editor. Unknown fields (agent-authored provenance, third-party effects, future schema extensions) are preserved byte-exact across edits, so the editor UI and your coding agent can both write the same document without clobbering each other. Optional JSONC comments are supported with a byte-identical strict-JSON round-trip for interop.

Agent-first, bring-your-own-model

Drive edits with any MCP-aware coding agent — Claude Code, Cursor, Copilot agent mode, Aider, Continue, Windsurf, Codex — operating on the timeline via filesystem read-and-edit primitives. Because prompts and completions run under your own subscription and never traverse our servers, you get privacy by construction and same-day frontier-model upgrades with no extension release required.

Deterministic MCP tool surface (no LLM arithmetic)

Frame-accurate work is handled by deterministic tools, not token generation — eliminating the latency, off-by-one errors, and non-reproducibility of asking an LLM to do timecode math. The server exposes compile_ffmpeg_argv, validate_otio, detect_filler_words, generate_srt, and media_metadata, plus client-side prompt templates (color_grade_intent, suggest_cuts). The LLM handles intent and style; the tools handle the bit-precise math.

Local AI inference at electricity-only cost

Speech-to-text runs locally via a cached Whisper.cpp model, so per-edit inference is host electricity rather than per-token cloud charges. Per-word transcript timestamps anchor frame-precise cuts ("cut every 'um'", "split when the speaker says X") with the second-to-frame conversion done deterministically on the server. The pipeline is model-agnostic and extends to local image generation, segmentation, frame interpolation, lip-sync, and vision models.

Incremental render compilation

Timeline rendering works like a modern build system: each clip is a translation unit whose rendered segment is cached on disk by a content hash of its inputs (source, range, effects, transition-neighbor context, encoder settings). An edit re-renders only the clips that actually changed, and the preview is assembled by stream-copy concatenation — bounding render cost to the changed portion of the timeline instead of its total length, for sub-second iteration.

Fast media handling

A composite content-addressed cache key — SHA-256(size ‖ head_1MB ‖ tail_1MB ‖ mtime) — produces a stable key for any media file in under 50 ms regardless of size, keeping import gestures interactive on multi-gigabyte files. Waveform peaks for any video container are produced by streaming FFmpeg's WAV output directly into BBC audiowaveform with no intermediate files. Encoding uses GPU acceleration (h264_nvenc) with an automatic, session-latched software fallback (libx264) — no configuration required.

Git-native supervisor loop

On render completion the document is committed and tagged. You retain terminal authority: revert to the last render checkpoint with a non-destructive, path-scoped restore that doesn't touch your other work; branch any cut into an alternate and open a pull request via the host's REST API. Agent commits carry Co-Authored-By: provenance, giving every edit an auditable chain-of-custody in plain git log / git blame / git show.

Three complementary UI surfaces

Native file-explorer access — the .otio file opens as plain text in the standard JSON editor; diff, search, fold, and jump-to-symbol come for free.
Interactive Pixi.js timeline panel — GPU-accelerated visualization with per-frame playhead, drag-to-reorder, trim-by-handle, split-at-playhead, in/out marking, and an embedded preview bound to the incremental renderer.
Eight-section sidebar — Timelines, Media Library, Tools (YouTube download, audio extraction, transcription, TTS, import), Jobs (live background-job status), MCP Tools, History, Account, and Help — all auto-refreshed by the workspace file watcher.

Robust cross-platform bootstrap

External tools installed via system package managers are discovered even when PATH updates haven't propagated into the running IDE: a multi-plan spawn fallback (PATH binary → python -m → py -m) with a post-spawn liveness probe, plus Windows winget package-directory probing with PATH injection into spawned children — no IDE restart after a fresh install.

Build

npm run build      # tsc -b + esbuild bundle
npm run package    # vsce package → .vsix

The VSIX is self-contained: @owlie-studio/core is inlined via esbuild, so install-time npm install is not required.

Patent pending — USPTO provisional application No. 64/069,062 (filed 2026-05-19).

Owlie Video Studio

Owlie