OIOXO — Private On-Device Coding AI & Token Saver

Your code is your most valuable asset. OIOXO is built on one principle: it stays on your machine.

OIOXO brings coding intelligence to VS Code without the cloud in the middle — a model that downloads once, encrypted, and runs on this device, grounded in your actual codebase. And for the AI tools you already pay for, OIOXO becomes a token saver: it cuts what they read and what they write, so your subscriptions and API keys go several times further.

Private, local coding — the default

The model runs here. OIOXO's own models (0.4–4.7 GB) are matched to your machine's real capabilities — memory, free RAM, GPU — download once encrypted, unlock per session, and run locally. Your code, your questions, and the answers never leave the device.
Grounded in your project, verifiably. Every answer is built on the OIOXO context engine — the minimal relevant slice of your codebase, found by lexical search and the import graph. Replies show exactly which files they were grounded in.
Made for working code. Streamed answers with proper code blocks — Copy, Insert at cursor, New file — and right-click Explain / Improve / Add selection from any editor. Ctrl+Alt+O opens the panel.
Your standards travel with the repo. .oioxo/rules.md and .oioxo/memory.md are honored in every conversation.

The token saver — for everything else you use

Most teams already pay for Copilot, Claude Code, Cursor, or raw API keys. OIOXO makes all of them cheaper, in three independent ways:

Minimal context for your agents. Copilot, Claude Code, Cursor, Windsurf, Gemini CLI and Codex are wired automatically the moment you open a folder. Instead of reading whole files, they receive the relevant slice — measured on a real production codebase: 90–92% fewer context tokens per question.
Leaner requests and replies on your own keys. When you route OIOXO through a key you own, prompts are compressed before they leave (duplicate and stale context removed) and replies are kept lean — output tokens, the expensive ones, drop sharply. Three professional levels: Disabled · Balanced · Aggressive.
Local-first answers. Questions can be answered by the on-device model first, escalating to your paid model only when genuinely needed — every non-escalated question is an external call that never happened.

None of this ever applies to OIOXO's own models. They always run at full quality. The status bar keeps honest score: OIOXO · 41k saved.

When you want a frontier model

Bring your own key — OpenAI, Anthropic, Google, Groq, Mistral, xAI, DeepSeek, OpenRouter, Together, Fireworks, Cerebras, a local Ollama, or any OpenAI-compatible endpoint. Keys are held in VS Code secret storage and travel only to the provider you chose; OIOXO never sees them and never proxies your traffic.

Private by architecture, not by promise

The index, the grounding, and the on-device models run 100% locally.
Models ship encrypted and unlock per session — at rest, they are inert data.
The only thing OIOXO's servers ever receive is a number: how many tokens you saved.

Pricing

Free — the on-device models, the assistant, your own keys, and a generous monthly saved-tokens allowance.
OIOXO Pro — $3.99/mo — unlimited token-saving across every tool, every project, every machine. → oioxo.com

Requirements

VS Code 1.85+ · Node.js 18+ (the agent-facing engine runs via npx oioxo-mcp)
On-device model: ~1 GB free RAM for the smallest tier; larger tiers are offered when your machine genuinely fits them

From the makers of the OIOXO IDE — the editor that runs its models in your browser.