OIOXO — Private On-Device Coding AI & Token Saver
Your code is your most valuable asset. OIOXO is built on one principle: it stays on your machine.
OIOXO brings coding intelligence to VS Code without the cloud in the middle — a model that downloads once, encrypted, and runs on this device, grounded in your actual codebase. And for the AI tools you already pay for, OIOXO becomes a token saver: it cuts what they read and what they write, so your subscriptions and API keys go several times further.
Private, local coding — the default
- The model runs here. OIOXO's own models (0.4–4.7 GB) are matched to your machine's real capabilities — memory, free RAM, GPU — download once encrypted, unlock per session, and run locally. Your code, your questions, and the answers never leave the device.
- Grounded in your project, verifiably. Every answer is built on the OIOXO context engine — the minimal relevant slice of your codebase, found by lexical search and the import graph. Replies show exactly which files they were grounded in.
- Made for working code. Streamed answers with proper code blocks — Copy, Insert at cursor, New file — and right-click Explain / Improve / Add selection from any editor.
Ctrl+Alt+O opens the panel.
- Your standards travel with the repo.
.oioxo/rules.md and .oioxo/memory.md are honored in every conversation.
The token saver — for everything else you use
Most teams already pay for Copilot, Claude Code, Cursor, or raw API keys. OIOXO makes all of them cheaper, in three independent ways:
- Minimal context for your agents. Copilot, Claude Code, Cursor, Windsurf, Gemini CLI and Codex are wired automatically the moment you open a folder. Instead of reading whole files, they receive the relevant slice — measured on a real production codebase: 90–92% fewer context tokens per question.
- Leaner requests and replies on your own keys. When you route OIOXO through a key you own, prompts are compressed before they leave (duplicate and stale context removed) and replies are kept lean — output tokens, the expensive ones, drop sharply. Three professional levels: Disabled · Balanced · Aggressive.
- Local-first answers. Questions can be answered by the on-device model first, escalating to your paid model only when genuinely needed — every non-escalated question is an external call that never happened.
None of this ever applies to OIOXO's own models. They always run at full quality. The status bar keeps honest score: OIOXO · 41k saved.
When you want a frontier model
Bring your own key — OpenAI, Anthropic, Google, Groq, Mistral, xAI, DeepSeek, OpenRouter, Together, Fireworks, Cerebras, a local Ollama, or any OpenAI-compatible endpoint. Keys are held in VS Code secret storage and travel only to the provider you chose; OIOXO never sees them and never proxies your traffic.
Private by architecture, not by promise
- The index, the grounding, and the on-device models run 100% locally.
- Models ship encrypted and unlock per session — at rest, they are inert data.
- The only thing OIOXO's servers ever receive is a number: how many tokens you saved.
Pricing
- Free — the on-device models, the assistant, your own keys, and a generous monthly saved-tokens allowance.
- OIOXO Pro — $3.99/mo — unlimited token-saving across every tool, every project, every machine. → oioxo.com
Requirements
- VS Code 1.85+ · Node.js 18+ (the agent-facing engine runs via
npx oioxo-mcp)
- On-device model: ~1 GB free RAM for the smallest tier; larger tiers are offered when your machine genuinely fits them
From the makers of the OIOXO IDE — the editor that runs its models in your browser.
© OIOXO · oioxo.com · Docs · Terms
| |