Skip to content
| Marketplace
Sign in
Visual Studio Code>Chat>RB Ollama AgentsNew to Visual Studio Code? Get it now.
RB Ollama Agents

RB Ollama Agents

Robin Bakshi

|
131 installs
| (0) | Free
Free chat sidebar for VS Code & Antigravity. Pay-less alternative to $5K Antigravity. Direct APIs for Ollama, DeepSeek, Qwen, Zhipu, Baidu, Moonshot, Hunyuan, Mimo, OpenAI, Claude, Gemini. Plan/Code/Ask agent modes, savings counter, image+PDF+DOCX attach, BYOK.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

RB Ollama Agents — Visual Studio Code, Antigravity, Cursor, VSCodium, Windsurf, Gitpod

Free, MIT, BYOK chat sidebar for VS Code, Antigravity, Cursor, VSCodium, Windsurf, Gitpod — direct APIs for Ollama (local + cloud), DeepSeek, Qwen / Alibaba, Zhipu (GLM), Baidu (ERNIE), Moonshot (Kimi), Tencent Hunyuan, Xiaomi Mimo, OpenAI, Anthropic Claude, Google Gemini, OpenRouter, Groq. The pay-less alternative to ~$5K/year Antigravity / Cursor / Copilot Pro / Codeium.

🐳 Taking the revolution a bit further for our WhalesBrother. Open-weights and open-API frontier models from China and the global open-source community deserve first-class IDE support. This is our small contribution to that wave — free, MIT, public.

VS Code Marketplace Open VSX GitHub release License: MIT ISO 27001:2022 Sponsor PayPal Tip


Version history

  • Full release notes: CHANGELOG.md
  • Tagged releases and downloadable VSIX files: https://github.com/robinbakshi007/ollama-direct-custom-agent/releases

💖 Sponsor this project

If RB Ollama saves you the ~$5,000/year an agentic IDE subscription would have cost, please consider supporting development:

  • 💖 Become a GitHub Sponsor — monthly or one-time tiers
  • ☕ Tip $1.99 via PayPal — quick one-tap thank-you

Sponsorship funds open-source frontier-model tooling for WhalesBrother, EU, India, Asia-Pacific and USA developer communities — keeping this stack free, MIT, BYOK and zero-telemetry forever.

We are an ISO 27001:2022 certified company. Our security posture, change management and key-handling practices are independently audited — your API keys live only in your OS keychain (VS Code SecretStorage, macOS Keychain / Windows Credential Manager / libsecret).


🏆 Why we are the best

  • Built to last 10+ years. The architecture is OpenAI-compatible at the wire level — every new frontier model that ships an /v1/chat/completions endpoint works on day one, no extension update required.
  • No middleman, no proxy, no telemetry. Your prompts go straight from your editor to the model provider you choose. Period.
  • MIT licensed, fully open source. Audit, fork, self-host, embed.
  • One install, six editors. VS Code, Antigravity, Cursor, VSCodium, Windsurf, Gitpod — same extension, same UX.
  • Encrypted secrets at rest. API keys + Ollama Cloud session cookies stored via OS-level SecretStorage. Never written to settings.json in plaintext.
  • Frontier coverage. Ollama (local + cloud), DeepSeek, Qwen / Alibaba, Zhipu (GLM), Baidu (ERNIE), Moonshot (Kimi), Tencent Hunyuan, Xiaomi Mimo, OpenAI, Anthropic, Gemini, OpenRouter, Groq.

🚀 Coming next: bigger than Mythos

We've already shipped the foundation that will easily last the next decade. The next tool we publish will go further than Mythos — and it will be fully open-source so the community can audit it and prevent any malicious harm. Star the repo to get notified.


📊 Full feature table

Capability RB Ollama (free) Antigravity Cursor Copilot Pro Codeium
Annual cost (typical seat) $0 ~$5,000 ~$240 ~$240 ~$180
Direct Ollama local models ✅ ❌ ❌ ❌ partial
Direct Ollama Cloud (:cloud models) ✅ ❌ ❌ ❌ ❌
Direct DeepSeek / Qwen / GLM / Kimi / ERNIE / Hunyuan / Mimo ✅ ❌ partial ❌ ❌
Direct OpenAI / Claude / Gemini / Groq / OpenRouter (BYOK) ✅ ❌ partial ❌ partial
Auto routing (local → cloud) with $-saved counter ✅ ❌ ❌ ❌ ❌
Encrypted SecretStorage for API keys + cookies ✅ n/a n/a n/a n/a
Drag-and-drop images / PDFs / DOCX / TXT / MD ✅ partial partial partial ❌
Vision auto-routing (image → vision-capable model) ✅ ❌ ❌ ❌ ❌
Agent modes: Chat / Plan / Code / Ask / Architect ✅ partial partial partial ❌
Multi-agent roster + drag-sortable priority order ✅ ❌ ❌ ❌ ❌
Role-split parallel agents (architect / builder / validator) ✅ ❌ ❌ ❌ ❌
Per-task assistant routing (Image / Doc / Code / QA) ✅ ❌ ❌ ❌ ❌
Multi-period analytics (today / week / fortnight / month / quarter / YTD / custom) ✅ ❌ ❌ ❌ ❌
CSV export with date-range filename suffix ✅ ❌ ❌ ❌ ❌
Cloud token guard + account rotation ✅ ❌ ❌ ❌ ❌
Context window meter + reserved-for-response ✅ partial partial ❌ ❌
Zero telemetry ✅ ❌ ❌ ❌ ❌
MIT licensed, fully open source ✅ ❌ ❌ ❌ ❌
ISO 27001:2022 certified maintainer ✅ n/a n/a n/a n/a
One install across VS Code / Antigravity / Cursor / VSCodium / Windsurf / Gitpod ✅ ❌ ❌ ❌ partial

🌍 Communities we serve

We are actively assisting and accepting contributions from:

  • 🐳 WhalesBrother — open-weights / open-API frontier models from China and the global open-source community
  • 🇪🇺 European Union developer collectives (GDPR-respecting AI tooling, on-device first)
  • 🇮🇳 India — IndiaStack, ONDC, BHASHINI integrators
  • 🌏 Asia Pacific — Japan, Korea, Singapore, Australia, NZ, ASEAN
  • 🇺🇸 United States developer communities — independents, startups and education

If your community wants a localised onboarding guide, open an issue or sponsor a workstream.


🔐 Trust & compliance

  • ISO 27001:2022 certified information-security management system covering source-code handling, key custody and release engineering.
  • API keys & Ollama Cloud session cookies are encrypted at rest in the OS keychain via VS Code SecretStorage — never in settings.json.
  • Privacy-first analytics model — by default, no usage telemetry is sent. If you explicitly enable billing/security analytics consent in Plans settings, the client shares only an anonymous random install UUID (no device fingerprint) plus consent/retention metadata; country is derived server-side from request IP and not collected directly on-device.
  • Data minimization & retention — analytics consent is optional and revocable, and retention days are configurable (default 30 days).
  • MIT licensed — full source on GitHub, reproducible build via node esbuild.js --production && npx @vscode/vsce package.

Privacy Policy Notes (GDPR / DPDP / CCPA)

  • Billing/security analytics are opt-in and disabled unless you provide explicit consent.
  • The extension uses a random anonymous install ID only; it does not use hardware IDs, MAC address, serial number, or persistent device fingerprinting.
  • Country is resolved server-side from request IP and should be stored only at country granularity.
  • Retention must be limited to the configured period and deleted after expiry.
  • Promo codes and license tokens are stored in encrypted SecretStorage (OS keychain), never plaintext settings.

Why this exists

The big-name agentic IDEs are charging hundreds-to-thousands of dollars per seat per year for thin wrappers around the same public APIs you can call yourself. Meanwhile:

  • Ollama Cloud has frontier models like gpt-oss:120b-cloud, deepseek-v4-pro:cloud, gemini-3-flash-preview:cloud, kimi-k2.6:cloud, glm-5.1:cloud, gemma4:31b-cloud — none of which the major IDE agents let you wire into their chat sidebar.
  • DeepSeek, Qwen, Zhipu, Moonshot, Hunyuan, Mimo all expose OpenAI-compatible APIs — and they are dramatically cheaper than GPT-4-class models for everyday coding.
  • Antigravity / Cursor / Codex / Copilot Chat keep this firmly behind their own backends.

This extension is the missing bridge. It is free, MIT-licensed, open source, zero-telemetry, no proxy of mine in the middle.


What you get

  • ✅ One install across VS Code, Antigravity, Cursor, VSCodium, Windsurf, Gitpod
  • ✅ Sidebar chat with model picker inside the composer (matches native Antigravity / Codex / Gemini Code Assist UX)
  • ✅ ✨ Auto routing — automatically picks the cheapest viable model (local first), with a live $ saved / % saved counter
  • ✅ Inline bottom Cost & Analytics panel — expandable below the composer, with per-model usage %, request/token/task split, and day/week CSV export
  • ✅ Context window meter with Reserved for response display and token guard warnings
  • ✅ Cloud token guard — when cloud requests approach reserve limits, route to local fallback automatically (configurable)
  • ✅ Cloud account rotation — configure multiple Ollama cloud account profiles and auto-switch when weekly usage threshold is reached
  • ✅ Agent modes in the + menu — Chat / Plan / Code / Ask / Architect
  • ✅ Assistant dropdown with grouped options: Digital Assistant (OpenClaw), Agents (Hermes, Claude Code, Codex, Copilot CLI, OpenCode, Droid, Goose, Pi, Pool), Chat & RAG (Onyx), and Automation (n8n)
  • ✅ Assistant Routing panel with dedicated dropdowns for Image / Document / Code / QA and optional auto-launch toggle
  • ✅ Multi-agent setup controls in settings (multiAgentSetupEnabled, multiAgentRoster) to activate/deactivate routing and define the agent roster
  • ✅ Team orchestration controls (assistantTeamOrchestrationEnabled, assistantTeamRoles, assistantTeamRequirePlanApproval) for lead/subagent style workflows
  • ✅ In-chat gear shortcut on extension info rows to open RB Ollama settings directly
  • ✅ + menu: drag-and-drop PNG, JPG, PDF, DOCX, TXT, MD — vision images automatically route to a vision-capable model
  • ✅ Permissions preset (Default / Auto-review / Full access / Custom) shaping the system prompt
  • ✅ One-click "Add provider" — quick-pick presets for DeepSeek, Qwen, Zhipu, Baidu, Moonshot, Hunyuan, Mimo, OpenAI, Claude, Gemini, OpenRouter, Groq. Just paste your API key.
  • ✅ Bring-your-own-key custom providers for any other OpenAI-compatible endpoint
  • ✅ Settings page surfaced under @ext:RobinBakshi.ollama-direct-custom-agent
  • ✅ Zero telemetry, zero proxies, zero accounts of mine

Digital Assistants quick guide

Use this when you want OpenClaw, Hermes Agent, Claude Code, Codex, Copilot CLI, and others to be routed automatically by task.

  1. Open RB Ollama Settings (gear icon in chat, or command: RB Ollama: Open Settings).
  2. In Digital Assistants tab, assign assistant per task type:
  • Image tasks
  • Document tasks
  • Code tasks
  • QA tasks
  1. In RB Ollama Agents tab, enable:
  • Auto-route by task type
  • Multi-agent orchestration (optional)
  1. In API Keys tab, add keys only when needed:
  • If you already use Ollama Cloud models, assistant keys are usually not required.
  • For standalone assistants/providers, add keys there.

API key security

  • Keys are encrypted via VS Code SecretStorage (OS keychain: macOS Keychain, Windows Credential Manager, Linux libsecret).
  • The extension UI shows only masked previews of stored keys.
  • Legacy plaintext keys in settings are auto-migrated to SecretStorage and cleared.

Screenshots

Screenshots are optional and intentionally not hot-linked here unless files exist, to avoid broken rendering in GitHub pages.

Add your files here when ready:

  • docs/screenshots/antigravity.png
  • docs/screenshots/vscode.png

Tip: keep image names exactly as above for consistency across release notes.


Install

Step 1 — Install Ollama (free) and (optionally) Ollama Pro

  1. Install Ollama → https://ollama.com/download
  2. Sign in: ollama signin (free)
  3. (Optional, only for :cloud models) subscribe to Ollama Pro: https://ollama.com/settings/billing
  4. Pull at least one model:
    # Free local models (no Pro needed) — recommended for token savings
    ollama pull qwen3-coder:30b      # 18 GB — great coding
    ollama pull llama3.1:8b          # 4.9 GB — fast general
    ollama pull gemma4:e4b           # 9.6 GB — multimodal/vision
    ollama pull deepseek-coder-v2:16b   # 9 GB — coding (smaller)
    
    # Cloud (Ollama Pro)
    ollama pull gemini-3-flash-preview:cloud
    ollama pull gpt-oss:120b-cloud
    ollama pull deepseek-v4-pro:cloud
    ollama pull gemma4:31b-cloud
    
  5. Verify:
    ollama list
    curl http://127.0.0.1:11434/api/tags
    

Step 2 — Install the extension

Editor One-click Direct .vsix download CLI
VS Code Install from Marketplace ⬇ Latest .vsix code --install-extension RobinBakshi.ollama-direct-custom-agent
Antigravity Extensions panel → search RB Ollama Agents ⬇ Latest .vsix antigravity --install-extension RobinBakshi.ollama-direct-custom-agent
Cursor Extensions panel → search RB Ollama Agents ⬇ Latest .vsix cursor --install-extension RobinBakshi.ollama-direct-custom-agent
VSCodium Install from Open VSX ⬇ Latest .vsix codium --install-extension RobinBakshi.ollama-direct-custom-agent
Windsurf / Gitpod Extensions panel → search RB Ollama Agents ⬇ Latest .vsix <editor-cli> --install-extension RobinBakshi.ollama-direct-custom-agent

Direct downloads (all releases): https://github.com/robinbakshi007/ollama-direct-custom-agent/releases

Or install the .vsix directly:

# Download from the latest release page and install that file
# (filename changes every version)
code         --install-extension ./ollama-direct-custom-agent-<version>.vsix
antigravity  --install-extension ./ollama-direct-custom-agent-<version>.vsix
cursor       --install-extension ./ollama-direct-custom-agent-<version>.vsix
codium       --install-extension ./ollama-direct-custom-agent-<version>.vsix

Step 3 — Open it

  1. Restart your editor once after install
  2. Click the speech-bubble icon in the left activity bar → RB Ollama Agents
  3. Pick ✨ Auto in the composer dropdown — done.

✨ Auto routing & savings counter

Selecting ✨ Auto (prefer local → cloud) routes each turn to the cheapest viable model:

  1. If your prompt has images attached, Auto picks the best vision-capable model (cloud preferred — Gemini 3 Flash, Gemma 4, GPT-4o, Claude 3 — falling back to a local vision model).
  2. Otherwise, Auto picks a local model (free, $0/token), preferring coder/instruct variants.
  3. Only if no local model is installed does Auto fall back to a :cloud model, preferring small/cheap ones (*flash*, *mini*, *haiku*).

The header above the chat shows a live tally:

$1.27 saved (78%)        24 local · 7 cloud requests       ⚙  ↺
  • $ saved = local-token-count × your cloudPricePerMTok setting (default $0.50 / 1M tok).
  • % saved = local tokens ÷ total tokens.
  • ⚙ opens settings, ↺ resets the counter.

Tune the strategy under settings → RB Ollama: Auto Prefer:

  • local-first (default) — always go local when possible
  • cheapest-cloud — for prompts > 4 K chars, jump to a cheap cloud model
  • balanced — for prompts > 8 K chars, use cloud

Per-model usage percentages

The extension now tracks request share by model and shows percentages in two places:

  • A compact Usage: bar under the savings header (modelA 42% · modelB 31% ...)
  • Model dropdown labels (use X%)

This helps users manually track model mix while still keeping Auto as default.

Model Analytics panel + exports

Expand the Model Analytics panel below the usage bar to view:

  • model-level usage percentage
  • request count
  • token split
  • task split (chat, image, doc, code, qa)

Export manual reports for finance/compliance/audits:

  • Export Day CSV
  • Export Week CSV

Coding accuracy percentages

There is no universal real-time "accuracy" feed from model vendors, so this extension supports manual benchmark percentages via settings:

ollamaDirectCustomAgent.modelAccuracyOverrides

Example:

"ollamaDirectCustomAgent.modelAccuracyOverrides": [
  { "id": "deepseek-v4-pro:cloud", "accuracyPercent": 91 },
  { "id": "qwen3-coder:30b", "accuracyPercent": 86 },
  { "id": "custom:openrouter:google/gemini-2.5-flash", "accuracyPercent": 88 }
]

These show up in the picker as acc X%.

Task-based model handover

When Auto is selected, you can route specific task types to dedicated models:

  • taskModelImage (image understanding)
  • taskModelDoc (PDF/DOCX/text-heavy extraction)
  • taskModelCode (coding mode)
  • taskModelQa (Q&A / Ask mode, e.g. Jules-style QA model)

Enable/disable with:

ollamaDirectCustomAgent.taskRoutingEnabled

If a task model is set to __auto__, normal Auto routing applies.

Onyx (Chat & RAG)

Onyx is a self-hostable chat/RAG system that can connect to Ollama and supports custom agents, connectors, deep research, and MCP/OpenAPI actions.

Quick path:

  1. Deploy Onyx via quickstart: https://docs.onyx.app/deployment/getting_started/quickstart
  2. In setup, choose Ollama as provider
  3. Set Ollama URL:
  • local: http://127.0.0.1:11434
  • Docker: http://host.docker.internal:11434

In this extension, choose assistant Onyx (Chat & RAG) and click Launch to open setup/launch guidance.

n8n (Automation with Ollama)

n8n workflows can call Ollama nodes for automations and agents.

Quick path:

  1. Install n8n
  2. In n8n, create Ollama credentials
  3. Set API URL:
  • local: http://localhost:11434
  • Docker: http://host.docker.internal:11434
  1. Build workflow with Ollama nodes and select model (for example qwen3-coder)

Cloud path:

  1. Create API key at https://ollama.com/settings/keys
  2. In n8n, set API URL https://ollama.com and add key

In this extension, choose assistant n8n (Automation) and click Launch.

Documentation index sync (llms.txt)

Use one of these to sync the official index for model/tool discovery:

  • + → Plugins → Sync Ollama docs index (llms.txt)
  • command: RB Ollama: Sync Ollama Documentation Index (llms.txt)

Source: https://docs.ollama.com/llms.txt


+ menu — attachments

Click + in the composer or drag & drop files anywhere on the composer:

File type Behaviour
PNG / JPG / GIF / WebP Sent as multimodal images. Auto routes to a vision model (gemma4:31b-cloud, gemini-3-flash-preview:cloud, gpt-4o, claude-3-*, etc.)
PDF Text extracted via pdfjs-dist, prepended to your prompt
DOCX Text extracted via mammoth, prepended to your prompt
TXT / MD / source files Read as UTF-8 and prepended

Coming via MCP (planned): browser actions, Slack/Gmail/Drive/Calendar connectors.


Agent modes (in the + menu)

Click + in the composer and pick a mode — it shapes the assistant's system prompt:

Mode Behaviour
💬 Chat (default) Normal conversational coding assistant.
🗂 Plan Produces a numbered, step-by-step plan. Does not write final code unless asked.
💻 Code Direct, ready-to-paste code edits with minimal prose.
❓ Ask Answers in 1–3 sentences, no proposed changes.
🏗 Architect High-level design, trade-offs, mermaid diagrams.

The mode shows in the composer placeholder, e.g. Ask anything… [Plan mode].


Settings → RB Ollama: Permissions:

Preset What changes
Default System prompt: only reference files explicitly attached
Auto-review System prompt: present commands clearly and ask for confirmation
Full access System prompt: freely reference workspace context
Custom Use your own multi-line systemPrompt

🔒 Honest note: VS Code extensions cannot enforce OS-level sandboxing. This setting controls what the model is told it may do, not what your editor will actually let it do. Real shell/browser execution is a future feature via MCP.


Bring your own key — DeepSeek, Qwen, Zhipu, Baidu, Moonshot, Hunyuan, Mimo, Claude, GPT, Gemini

The fastest way: open the + menu in the chat composer → 🔑 Add provider… (or Cmd-Shift-P → RB Ollama: Add Provider). Pick from the preset catalogue, paste your API key, done.

Provider Endpoint preset Models shipped
DeepSeek (direct) https://api.deepseek.com/v1 deepseek-chat, deepseek-reasoner, deepseek-coder
Qwen / Alibaba DashScope https://dashscope-intl.aliyuncs.com/compatible-mode/v1 qwen-max, qwen-plus, qwen-flash, qwen-vl-max, qwen-coder-plus, qwen3-coder-plus, qwen3-max
Zhipu AI (GLM) https://open.bigmodel.cn/api/paas/v4 glm-4.6, glm-4-plus, glm-4-flash, glm-4v-plus
Baidu Qianfan (ERNIE) https://qianfan.baidubce.com/v2 ernie-4.5-turbo-128k, ernie-4.0-turbo-8k, ernie-speed-128k
Moonshot AI (Kimi) https://api.moonshot.cn/v1 kimi-k2-0905-preview, moonshot-v1-128k, moonshot-v1-32k
Tencent Hunyuan (混元) https://api.hunyuan.cloud.tencent.com/v1 hunyuan-turbos-latest, hunyuan-large, hunyuan-vision
Xiaomi Mimo V2 Pro https://api.xiaomi.com/v1 mimo-v2-pro, mimo-v2
OpenAI https://api.openai.com/v1 gpt-4o, gpt-4o-mini, o4-mini
Anthropic Claude https://api.anthropic.com/v1 claude-3-5-sonnet-latest, claude-3-5-haiku-latest
Google Gemini https://generativelanguage.googleapis.com/v1beta/openai gemini-2.5-flash, gemini-2.5-pro
OpenRouter https://openrouter.ai/api/v1 anthropic/claude-3.5-sonnet, google/gemini-2.5-flash, x-ai/grok-2
Groq https://api.groq.com/openai/v1 llama-3.3-70b-versatile, qwen-2.5-coder-32b

You can also add them by hand in settings.json:

"ollamaDirectCustomAgent.customProviders": [
  {
    "id": "deepseek",
    "name": "DeepSeek",
    "baseUrl": "https://api.deepseek.com/v1",
    "apiKey": "sk-...",
    "models": ["deepseek-chat", "deepseek-reasoner"]
  },
  {
    "id": "qwen",
    "name": "Qwen",
    "baseUrl": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    "apiKey": "sk-...",
    "models": ["qwen-max", "qwen-plus", "qwen-flash", "qwen-vl-max", "qwen-coder-plus"]
  }
]

Note: a few endpoints (Anthropic native, some Baidu/Hunyuan auth modes) deviate slightly from OpenAI's spec. If you hit 400/401, try the OpenRouter route for the same model — it's always OpenAI-compat.


All settings (@ext:RobinBakshi.ollama-direct-custom-agent)

Setting Default Purpose
endpoint http://127.0.0.1:11434 Ollama HTTP base URL
defaultModel __auto__ Initial selection in the picker
defaultAssistant openclaw Assistant selected in the assistant dropdown
assistantTaskImage openclaw Assistant assignment for image tasks
assistantTaskDoc hermes Assistant assignment for document tasks
assistantTaskCode codex Assistant assignment for coding tasks
assistantTaskQa claude Assistant assignment for QA tasks
assistantAutoRouting true Automatically route assistant selection by task type (image/doc/code/qa)
assistantAutoLaunch false Auto-launch assigned assistant when task is detected
assistantApiKeys {} Legacy migration field. Values are moved to encrypted SecretStorage and removed from settings. Use Settings → API Keys.
multiAgentSetupEnabled true Activates task-based multi-agent setup (Image/Doc/Code/QA)
multiAgentRoster [openclaw, hermes, claude, codex, copilot, opencode, droid, goose, pi, pool, onyx, n8n] Controls which assistants are available in multi-agent routing
permissions default default / auto-review / full-access / custom
systemPrompt (empty) Used when permissions = custom
autoPrefer local-first Auto routing strategy
taskRoutingEnabled true Enable task-based model routing for Auto
taskModelImage __auto__ Preferred model for image tasks
taskModelDoc __auto__ Preferred model for doc tasks
taskModelCode __auto__ Preferred model for code tasks
taskModelQa __auto__ Preferred model for QA tasks
modelAccuracyOverrides [] Manual accuracy labels shown in picker
cloudPricePerMTok 0.5 $/1M tokens, used for the savings counter
openOnStartup false Focus the sidebar at startup
composerEnterBehavior send send (Enter sends) or newline
customProviders [] Array of {id,name,baseUrl,apiKey,models[]}

Cost-savings playbook

  1. Default to Auto. It chooses local whenever possible.
  2. Pull qwen3-coder:30b or llama3.1:8b — they cover ~80 % of everyday coding for free.
  3. Reserve :cloud and BYOK models (Claude, GPT-4o, Gemini) for: long-context reasoning, vision, hard refactors.
  4. Watch the $ saved counter go up.

Troubleshooting

Problem Fix
Cannot reach Ollama at http://127.0.0.1:11434 Run ollama serve or open the Ollama menubar app.
No models found ollama pull llama3.1:8b then click ↻.
Cloud model auth error ollama signin and verify Ollama Pro at https://ollama.com/settings/billing.
Doesn't appear in Antigravity Restart Antigravity after install. Speech-bubble icon in left activity bar.
Extension installs but engine version error This extension targets vscode ^1.95.0. Update Antigravity / Cursor / VSCodium.
BYOK provider returns 401 Wrong API key, or the baseUrl doesn't end at the /v1-style root.

Roadmap

  • 🛠 MCP client — connect Slack, Gmail, Drive, Calendar, Playwright (browser), shell — using the open Model Context Protocol ecosystem
  • 🛠 Shell tool execution with the permissions preset above gating each call
  • 🛠 Markdown / code-block rendering in the chat log
  • 🛠 Per-workspace model defaults

PRs welcome.


Building from source

git clone https://github.com/robinbakshi007/ollama-direct-custom-agent
cd ollama-direct-custom-agent
npm install
npm run package
npx @vscode/vsce package
# → ollama-direct-custom-agent-0.7.5.vsix

code         --install-extension ./ollama-direct-custom-agent-0.7.5.vsix
antigravity  --install-extension ./ollama-direct-custom-agent-0.7.5.vsix

License

MIT — see LICENSE. Use it, fork it, ship it.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft