Owlie StudioAI-powered video editor for VS Code — you direct, it edits, and nothing is ever lost. Owlie Studio flips video editing on its head: instead of doing the work by hand, you tell an AI assistant what you want — trim this, add captions, reorder these scenes — and it makes the edits for you. You stay in control: review each change and keep it, undo it, or try a different version to explore an idea. The AI does the hands-on work; you decide what makes the final cut. It runs on your own AI subscription, so you bring the assistant you already use. Watch on YouTubeNew to Owlie Studio? Watch the walkthroughs, tips, and release demos on our channel — click the logo to open it on YouTube. ▶️ Watch Owlie Studio on YouTube » Key FeaturesPlain-text timeline as the single source of truthYour project is an OpenTimelineIO (OTIO) JSON file on disk — the SMPTE/Pixar
open exchange format — not a vendor-locked binary ( Agent-first, bring-your-own-modelDrive edits with any MCP-aware coding agent — Claude Code, Cursor, Copilot agent mode, Aider, Continue, Windsurf, Codex — operating on the timeline via filesystem read-and-edit primitives. Because prompts and completions run under your own subscription and never traverse our servers, you get privacy by construction and same-day frontier-model upgrades with no extension release required. Deterministic MCP tool surface (no LLM arithmetic)Frame-accurate work is handled by deterministic tools, not token generation —
eliminating the latency, off-by-one errors, and non-reproducibility of asking
an LLM to do timecode math. The server exposes Local AI inference at electricity-only costSpeech-to-text runs locally via a cached Whisper.cpp model, so per-edit inference is host electricity rather than per-token cloud charges. Per-word transcript timestamps anchor frame-precise cuts ("cut every 'um'", "split when the speaker says X") with the second-to-frame conversion done deterministically on the server. The pipeline is model-agnostic and extends to local image generation, segmentation, frame interpolation, lip-sync, and vision models. Incremental render compilationTimeline rendering works like a modern build system: each clip is a translation unit whose rendered segment is cached on disk by a content hash of its inputs (source, range, effects, transition-neighbor context, encoder settings). An edit re-renders only the clips that actually changed, and the preview is assembled by stream-copy concatenation — bounding render cost to the changed portion of the timeline instead of its total length, for sub-second iteration. Fast media handlingA composite content-addressed cache key — Git-native supervisor loopOn render completion the document is committed and tagged. You retain terminal
authority: revert to the last render checkpoint with a non-destructive,
path-scoped restore that doesn't touch your other work; branch any cut into
an alternate and open a pull request via the host's REST API. Agent commits
carry Three complementary UI surfaces
Robust cross-platform bootstrapExternal tools installed via system package managers are discovered even when
Build
The VSIX is self-contained: Patent pending — USPTO provisional application No. 64/069,062 (filed 2026-05-19). |
