RB Ollama Agents — Visual Studio Code, Antigravity, Cursor, VSCodium, Windsurf, Gitpod
Free, MIT, BYOK chat sidebar for VS Code, Antigravity, Cursor, VSCodium, Windsurf, Gitpod — direct APIs for Ollama (local + cloud), DeepSeek, Qwen / Alibaba, Zhipu (GLM), Baidu (ERNIE), Moonshot (Kimi), Tencent Hunyuan, Xiaomi Mimo, OpenAI, Anthropic Claude, Google Gemini, OpenRouter, Groq. The pay-less alternative to ~$5K/year Antigravity / Cursor / Copilot Pro / Codeium.
🐳 Taking the revolution a bit further for our WhalesBrother. Open-weights and open-API frontier models from China and the global open-source community deserve first-class IDE support. This is our small contribution to that wave — free, MIT, public.

Version history
If RB Ollama saves you the ~$5,000/year an agentic IDE subscription would have cost, please consider supporting development:
Sponsorship funds open-source frontier-model tooling for WhalesBrother, EU, India, Asia-Pacific and USA developer communities — keeping this stack free, MIT, BYOK and zero-telemetry forever.
We are an ISO 27001:2022 certified company. Our security posture, change management and key-handling practices are independently audited — your API keys live only in your OS keychain (VS Code SecretStorage, macOS Keychain / Windows Credential Manager / libsecret).
🏆 Why we are the best
- Built to last 10+ years. The architecture is OpenAI-compatible at the wire level — every new frontier model that ships an
/v1/chat/completions endpoint works on day one, no extension update required.
- No middleman, no proxy, no telemetry. Your prompts go straight from your editor to the model provider you choose. Period.
- MIT licensed, fully open source. Audit, fork, self-host, embed.
- One install, six editors. VS Code, Antigravity, Cursor, VSCodium, Windsurf, Gitpod — same extension, same UX.
- Encrypted secrets at rest. API keys + Ollama Cloud session cookies stored via OS-level SecretStorage. Never written to
settings.json in plaintext.
- Frontier coverage. Ollama (local + cloud), DeepSeek, Qwen / Alibaba, Zhipu (GLM), Baidu (ERNIE), Moonshot (Kimi), Tencent Hunyuan, Xiaomi Mimo, OpenAI, Anthropic, Gemini, OpenRouter, Groq.
🚀 Coming next: bigger than Mythos
We've already shipped the foundation that will easily last the next decade. The next tool we publish will go further than Mythos — and it will be fully open-source so the community can audit it and prevent any malicious harm. Star the repo to get notified.
📊 Full feature table
| Capability |
RB Ollama (free) |
Antigravity |
Cursor |
Copilot Pro |
Codeium |
| Annual cost (typical seat) |
$0 |
~$5,000 |
~$240 |
~$240 |
~$180 |
| Direct Ollama local models |
✅ |
❌ |
❌ |
❌ |
partial |
Direct Ollama Cloud (:cloud models) |
✅ |
❌ |
❌ |
❌ |
❌ |
| Direct DeepSeek / Qwen / GLM / Kimi / ERNIE / Hunyuan / Mimo |
✅ |
❌ |
partial |
❌ |
❌ |
| Direct OpenAI / Claude / Gemini / Groq / OpenRouter (BYOK) |
✅ |
❌ |
partial |
❌ |
partial |
| Auto routing (local → cloud) with $-saved counter |
✅ |
❌ |
❌ |
❌ |
❌ |
| Encrypted SecretStorage for API keys + cookies |
✅ |
n/a |
n/a |
n/a |
n/a |
| Drag-and-drop images / PDFs / DOCX / TXT / MD |
✅ |
partial |
partial |
partial |
❌ |
| Vision auto-routing (image → vision-capable model) |
✅ |
❌ |
❌ |
❌ |
❌ |
| Agent modes: Chat / Plan / Code / Ask / Architect |
✅ |
partial |
partial |
partial |
❌ |
| Multi-agent roster + drag-sortable priority order |
✅ |
❌ |
❌ |
❌ |
❌ |
| Role-split parallel agents (architect / builder / validator) |
✅ |
❌ |
❌ |
❌ |
❌ |
| Per-task assistant routing (Image / Doc / Code / QA) |
✅ |
❌ |
❌ |
❌ |
❌ |
| Multi-period analytics (today / week / fortnight / month / quarter / YTD / custom) |
✅ |
❌ |
❌ |
❌ |
❌ |
| CSV export with date-range filename suffix |
✅ |
❌ |
❌ |
❌ |
❌ |
| Cloud token guard + account rotation |
✅ |
❌ |
❌ |
❌ |
❌ |
| Context window meter + reserved-for-response |
✅ |
partial |
partial |
❌ |
❌ |
| Zero telemetry |
✅ |
❌ |
❌ |
❌ |
❌ |
| MIT licensed, fully open source |
✅ |
❌ |
❌ |
❌ |
❌ |
| ISO 27001:2022 certified maintainer |
✅ |
n/a |
n/a |
n/a |
n/a |
| One install across VS Code / Antigravity / Cursor / VSCodium / Windsurf / Gitpod |
✅ |
❌ |
❌ |
❌ |
partial |
🌍 Communities we serve
We are actively assisting and accepting contributions from:
- 🐳 WhalesBrother — open-weights / open-API frontier models from China and the global open-source community
- 🇪🇺 European Union developer collectives (GDPR-respecting AI tooling, on-device first)
- 🇮🇳 India — IndiaStack, ONDC, BHASHINI integrators
- 🌏 Asia Pacific — Japan, Korea, Singapore, Australia, NZ, ASEAN
- 🇺🇸 United States developer communities — independents, startups and education
If your community wants a localised onboarding guide, open an issue or sponsor a workstream.
🔐 Trust & compliance
- ISO 27001:2022 certified information-security management system covering source-code handling, key custody and release engineering.
- API keys & Ollama Cloud session cookies are encrypted at rest in the OS keychain via VS Code
SecretStorage — never in settings.json.
- Privacy-first analytics model — by default, no usage telemetry is sent. If you explicitly enable billing/security analytics consent in Plans settings, the client shares only an anonymous random install UUID (no device fingerprint) plus consent/retention metadata; country is derived server-side from request IP and not collected directly on-device.
- Data minimization & retention — analytics consent is optional and revocable, and retention days are configurable (default 30 days).
- MIT licensed — full source on GitHub, reproducible build via
node esbuild.js --production && npx @vscode/vsce package.
Privacy Policy Notes (GDPR / DPDP / CCPA)
- Billing/security analytics are opt-in and disabled unless you provide explicit consent.
- The extension uses a random anonymous install ID only; it does not use hardware IDs, MAC address, serial number, or persistent device fingerprinting.
- Country is resolved server-side from request IP and should be stored only at country granularity.
- Retention must be limited to the configured period and deleted after expiry.
- Promo codes and license tokens are stored in encrypted SecretStorage (OS keychain), never plaintext settings.
Why this exists
The big-name agentic IDEs are charging hundreds-to-thousands of dollars per seat per year for thin wrappers around the same public APIs you can call yourself. Meanwhile:
- Ollama Cloud has frontier models like
gpt-oss:120b-cloud, deepseek-v4-pro:cloud, gemini-3-flash-preview:cloud, kimi-k2.6:cloud, glm-5.1:cloud, gemma4:31b-cloud — none of which the major IDE agents let you wire into their chat sidebar.
- DeepSeek, Qwen, Zhipu, Moonshot, Hunyuan, Mimo all expose OpenAI-compatible APIs — and they are dramatically cheaper than GPT-4-class models for everyday coding.
- Antigravity / Cursor / Codex / Copilot Chat keep this firmly behind their own backends.
This extension is the missing bridge. It is free, MIT-licensed, open source, zero-telemetry, no proxy of mine in the middle.
What you get
- ✅ One install across VS Code, Antigravity, Cursor, VSCodium, Windsurf, Gitpod
- ✅ Sidebar chat with model picker inside the composer (matches native Antigravity / Codex / Gemini Code Assist UX)
- ✅ ✨ Auto routing — automatically picks the cheapest viable model (local first), with a live $ saved / % saved counter
- ✅ Inline bottom Cost & Analytics panel — expandable below the composer, with per-model usage %, request/token/task split, and day/week CSV export
- ✅ Context window meter with Reserved for response display and token guard warnings
- ✅ Cloud token guard — when cloud requests approach reserve limits, route to local fallback automatically (configurable)
- ✅ Cloud account rotation — configure multiple Ollama cloud account profiles and auto-switch when weekly usage threshold is reached
- ✅ Agent modes in the + menu — Chat / Plan / Code / Ask / Architect
- ✅ Assistant dropdown with grouped options: Digital Assistant (OpenClaw), Agents (Hermes, Claude Code, Codex, Copilot CLI, OpenCode, Droid, Goose, Pi, Pool), Chat & RAG (Onyx), and Automation (n8n)
- ✅ Assistant Routing panel with dedicated dropdowns for Image / Document / Code / QA and optional auto-launch toggle
- ✅ Multi-agent setup controls in settings (
multiAgentSetupEnabled, multiAgentRoster) to activate/deactivate routing and define the agent roster
- ✅ Team orchestration controls (
assistantTeamOrchestrationEnabled, assistantTeamRoles, assistantTeamRequirePlanApproval) for lead/subagent style workflows
- ✅ In-chat gear shortcut on extension info rows to open RB Ollama settings directly
- ✅ + menu: drag-and-drop PNG, JPG, PDF, DOCX, TXT, MD — vision images automatically route to a vision-capable model
- ✅ Permissions preset (Default / Auto-review / Full access / Custom) shaping the system prompt
- ✅ One-click "Add provider" — quick-pick presets for DeepSeek, Qwen, Zhipu, Baidu, Moonshot, Hunyuan, Mimo, OpenAI, Claude, Gemini, OpenRouter, Groq. Just paste your API key.
- ✅ Bring-your-own-key custom providers for any other OpenAI-compatible endpoint
- ✅ Settings page surfaced under
@ext:RobinBakshi.ollama-direct-custom-agent
- ✅ Zero telemetry, zero proxies, zero accounts of mine
Digital Assistants quick guide
Use this when you want OpenClaw, Hermes Agent, Claude Code, Codex, Copilot CLI, and others to be routed automatically by task.
- Open RB Ollama Settings (gear icon in chat, or command:
RB Ollama: Open Settings).
- In Digital Assistants tab, assign assistant per task type:
- Image tasks
- Document tasks
- Code tasks
- QA tasks
- In RB Ollama Agents tab, enable:
- Auto-route by task type
- Multi-agent orchestration (optional)
- In API Keys tab, add keys only when needed:
- If you already use Ollama Cloud models, assistant keys are usually not required.
- For standalone assistants/providers, add keys there.
API key security
- Keys are encrypted via VS Code SecretStorage (OS keychain: macOS Keychain, Windows Credential Manager, Linux libsecret).
- The extension UI shows only masked previews of stored keys.
- Legacy plaintext keys in settings are auto-migrated to SecretStorage and cleared.
Screenshots
Screenshots are optional and intentionally not hot-linked here unless files exist, to avoid broken rendering in GitHub pages.
Add your files here when ready:
docs/screenshots/antigravity.png
docs/screenshots/vscode.png
Tip: keep image names exactly as above for consistency across release notes.
Install
Step 1 — Install Ollama (free) and (optionally) Ollama Pro
- Install Ollama → https://ollama.com/download
- Sign in:
ollama signin (free)
- (Optional, only for
:cloud models) subscribe to Ollama Pro: https://ollama.com/settings/billing
- Pull at least one model:
# Free local models (no Pro needed) — recommended for token savings
ollama pull qwen3-coder:30b # 18 GB — great coding
ollama pull llama3.1:8b # 4.9 GB — fast general
ollama pull gemma4:e4b # 9.6 GB — multimodal/vision
ollama pull deepseek-coder-v2:16b # 9 GB — coding (smaller)
# Cloud (Ollama Pro)
ollama pull gemini-3-flash-preview:cloud
ollama pull gpt-oss:120b-cloud
ollama pull deepseek-v4-pro:cloud
ollama pull gemma4:31b-cloud
- Verify:
ollama list
curl http://127.0.0.1:11434/api/tags
Step 2 — Install the extension
| Editor |
One-click |
Direct .vsix download |
CLI |
| VS Code |
Install from Marketplace |
⬇ Latest .vsix |
code --install-extension RobinBakshi.ollama-direct-custom-agent |
| Antigravity |
Extensions panel → search RB Ollama Agents |
⬇ Latest .vsix |
antigravity --install-extension RobinBakshi.ollama-direct-custom-agent |
| Cursor |
Extensions panel → search RB Ollama Agents |
⬇ Latest .vsix |
cursor --install-extension RobinBakshi.ollama-direct-custom-agent |
| VSCodium |
Install from Open VSX |
⬇ Latest .vsix |
codium --install-extension RobinBakshi.ollama-direct-custom-agent |
| Windsurf / Gitpod |
Extensions panel → search RB Ollama Agents |
⬇ Latest .vsix |
<editor-cli> --install-extension RobinBakshi.ollama-direct-custom-agent |
Direct downloads (all releases): https://github.com/robinbakshi007/ollama-direct-custom-agent/releases
Or install the .vsix directly:
# Download from the latest release page and install that file
# (filename changes every version)
code --install-extension ./ollama-direct-custom-agent-<version>.vsix
antigravity --install-extension ./ollama-direct-custom-agent-<version>.vsix
cursor --install-extension ./ollama-direct-custom-agent-<version>.vsix
codium --install-extension ./ollama-direct-custom-agent-<version>.vsix
Step 3 — Open it
- Restart your editor once after install
- Click the speech-bubble icon in the left activity bar → RB Ollama Agents
- Pick
✨ Auto in the composer dropdown — done.
✨ Auto routing & savings counter
Selecting ✨ Auto (prefer local → cloud) routes each turn to the cheapest viable model:
- If your prompt has images attached, Auto picks the best vision-capable model (cloud preferred — Gemini 3 Flash, Gemma 4, GPT-4o, Claude 3 — falling back to a local vision model).
- Otherwise, Auto picks a local model (free, $0/token), preferring coder/instruct variants.
- Only if no local model is installed does Auto fall back to a
:cloud model, preferring small/cheap ones (*flash*, *mini*, *haiku*).
The header above the chat shows a live tally:
$1.27 saved (78%) 24 local · 7 cloud requests ⚙ ↺
- $ saved = local-token-count × your
cloudPricePerMTok setting (default $0.50 / 1M tok).
- % saved = local tokens ÷ total tokens.
- ⚙ opens settings, ↺ resets the counter.
Tune the strategy under settings → RB Ollama: Auto Prefer:
local-first (default) — always go local when possible
cheapest-cloud — for prompts > 4 K chars, jump to a cheap cloud model
balanced — for prompts > 8 K chars, use cloud
Per-model usage percentages
The extension now tracks request share by model and shows percentages in two places:
- A compact
Usage: bar under the savings header (modelA 42% · modelB 31% ...)
- Model dropdown labels (
use X%)
This helps users manually track model mix while still keeping Auto as default.
Model Analytics panel + exports
Expand the Model Analytics panel below the usage bar to view:
- model-level usage percentage
- request count
- token split
- task split (
chat, image, doc, code, qa)
Export manual reports for finance/compliance/audits:
- Export Day CSV
- Export Week CSV
Coding accuracy percentages
There is no universal real-time "accuracy" feed from model vendors, so this extension supports manual benchmark percentages via settings:
ollamaDirectCustomAgent.modelAccuracyOverrides
Example:
"ollamaDirectCustomAgent.modelAccuracyOverrides": [
{ "id": "deepseek-v4-pro:cloud", "accuracyPercent": 91 },
{ "id": "qwen3-coder:30b", "accuracyPercent": 86 },
{ "id": "custom:openrouter:google/gemini-2.5-flash", "accuracyPercent": 88 }
]
These show up in the picker as acc X%.
Task-based model handover
When Auto is selected, you can route specific task types to dedicated models:
taskModelImage (image understanding)
taskModelDoc (PDF/DOCX/text-heavy extraction)
taskModelCode (coding mode)
taskModelQa (Q&A / Ask mode, e.g. Jules-style QA model)
Enable/disable with:
ollamaDirectCustomAgent.taskRoutingEnabled
If a task model is set to __auto__, normal Auto routing applies.
Onyx (Chat & RAG)
Onyx is a self-hostable chat/RAG system that can connect to Ollama and supports custom agents, connectors, deep research, and MCP/OpenAPI actions.
Quick path:
- Deploy Onyx via quickstart:
https://docs.onyx.app/deployment/getting_started/quickstart
- In setup, choose
Ollama as provider
- Set Ollama URL:
- local:
http://127.0.0.1:11434
- Docker:
http://host.docker.internal:11434
In this extension, choose assistant Onyx (Chat & RAG) and click Launch to open setup/launch guidance.
n8n (Automation with Ollama)
n8n workflows can call Ollama nodes for automations and agents.
Quick path:
- Install n8n
- In n8n, create
Ollama credentials
- Set API URL:
- local:
http://localhost:11434
- Docker:
http://host.docker.internal:11434
- Build workflow with Ollama nodes and select model (for example
qwen3-coder)
Cloud path:
- Create API key at
https://ollama.com/settings/keys
- In n8n, set API URL
https://ollama.com and add key
In this extension, choose assistant n8n (Automation) and click Launch.
Documentation index sync (llms.txt)
Use one of these to sync the official index for model/tool discovery:
+ → Plugins → Sync Ollama docs index (llms.txt)
- command:
RB Ollama: Sync Ollama Documentation Index (llms.txt)
Source: https://docs.ollama.com/llms.txt
Click + in the composer or drag & drop files anywhere on the composer:
| File type |
Behaviour |
| PNG / JPG / GIF / WebP |
Sent as multimodal images. Auto routes to a vision model (gemma4:31b-cloud, gemini-3-flash-preview:cloud, gpt-4o, claude-3-*, etc.) |
| PDF |
Text extracted via pdfjs-dist, prepended to your prompt |
| DOCX |
Text extracted via mammoth, prepended to your prompt |
| TXT / MD / source files |
Read as UTF-8 and prepended |
Coming via MCP (planned): browser actions, Slack/Gmail/Drive/Calendar connectors.
Click + in the composer and pick a mode — it shapes the assistant's system prompt:
| Mode |
Behaviour |
| 💬 Chat (default) |
Normal conversational coding assistant. |
| 🗂 Plan |
Produces a numbered, step-by-step plan. Does not write final code unless asked. |
| 💻 Code |
Direct, ready-to-paste code edits with minimal prose. |
| ❓ Ask |
Answers in 1–3 sentences, no proposed changes. |
| 🏗 Architect |
High-level design, trade-offs, mermaid diagrams. |
The mode shows in the composer placeholder, e.g. Ask anything… [Plan mode].
Settings → RB Ollama: Permissions:
| Preset |
What changes |
| Default |
System prompt: only reference files explicitly attached |
| Auto-review |
System prompt: present commands clearly and ask for confirmation |
| Full access |
System prompt: freely reference workspace context |
| Custom |
Use your own multi-line systemPrompt |
🔒 Honest note: VS Code extensions cannot enforce OS-level sandboxing. This setting controls what the model is told it may do, not what your editor will actually let it do. Real shell/browser execution is a future feature via MCP.
Bring your own key — DeepSeek, Qwen, Zhipu, Baidu, Moonshot, Hunyuan, Mimo, Claude, GPT, Gemini
The fastest way: open the + menu in the chat composer → 🔑 Add provider… (or Cmd-Shift-P → RB Ollama: Add Provider). Pick from the preset catalogue, paste your API key, done.
| Provider |
Endpoint preset |
Models shipped |
| DeepSeek (direct) |
https://api.deepseek.com/v1 |
deepseek-chat, deepseek-reasoner, deepseek-coder |
| Qwen / Alibaba DashScope |
https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
qwen-max, qwen-plus, qwen-flash, qwen-vl-max, qwen-coder-plus, qwen3-coder-plus, qwen3-max |
| Zhipu AI (GLM) |
https://open.bigmodel.cn/api/paas/v4 |
glm-4.6, glm-4-plus, glm-4-flash, glm-4v-plus |
| Baidu Qianfan (ERNIE) |
https://qianfan.baidubce.com/v2 |
ernie-4.5-turbo-128k, ernie-4.0-turbo-8k, ernie-speed-128k |
| Moonshot AI (Kimi) |
https://api.moonshot.cn/v1 |
kimi-k2-0905-preview, moonshot-v1-128k, moonshot-v1-32k |
| Tencent Hunyuan (混元) |
https://api.hunyuan.cloud.tencent.com/v1 |
hunyuan-turbos-latest, hunyuan-large, hunyuan-vision |
| Xiaomi Mimo V2 Pro |
https://api.xiaomi.com/v1 |
mimo-v2-pro, mimo-v2 |
| OpenAI |
https://api.openai.com/v1 |
gpt-4o, gpt-4o-mini, o4-mini |
| Anthropic Claude |
https://api.anthropic.com/v1 |
claude-3-5-sonnet-latest, claude-3-5-haiku-latest |
| Google Gemini |
https://generativelanguage.googleapis.com/v1beta/openai |
gemini-2.5-flash, gemini-2.5-pro |
| OpenRouter |
https://openrouter.ai/api/v1 |
anthropic/claude-3.5-sonnet, google/gemini-2.5-flash, x-ai/grok-2 |
| Groq |
https://api.groq.com/openai/v1 |
llama-3.3-70b-versatile, qwen-2.5-coder-32b |
You can also add them by hand in settings.json:
"ollamaDirectCustomAgent.customProviders": [
{
"id": "deepseek",
"name": "DeepSeek",
"baseUrl": "https://api.deepseek.com/v1",
"apiKey": "sk-...",
"models": ["deepseek-chat", "deepseek-reasoner"]
},
{
"id": "qwen",
"name": "Qwen",
"baseUrl": "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
"apiKey": "sk-...",
"models": ["qwen-max", "qwen-plus", "qwen-flash", "qwen-vl-max", "qwen-coder-plus"]
}
]
Note: a few endpoints (Anthropic native, some Baidu/Hunyuan auth modes) deviate slightly from OpenAI's spec. If you hit 400/401, try the OpenRouter route for the same model — it's always OpenAI-compat.
All settings (@ext:RobinBakshi.ollama-direct-custom-agent)
| Setting |
Default |
Purpose |
endpoint |
http://127.0.0.1:11434 |
Ollama HTTP base URL |
defaultModel |
__auto__ |
Initial selection in the picker |
defaultAssistant |
openclaw |
Assistant selected in the assistant dropdown |
assistantTaskImage |
openclaw |
Assistant assignment for image tasks |
assistantTaskDoc |
hermes |
Assistant assignment for document tasks |
assistantTaskCode |
codex |
Assistant assignment for coding tasks |
assistantTaskQa |
claude |
Assistant assignment for QA tasks |
assistantAutoRouting |
true |
Automatically route assistant selection by task type (image/doc/code/qa) |
assistantAutoLaunch |
false |
Auto-launch assigned assistant when task is detected |
assistantApiKeys |
{} |
Legacy migration field. Values are moved to encrypted SecretStorage and removed from settings. Use Settings → API Keys. |
multiAgentSetupEnabled |
true |
Activates task-based multi-agent setup (Image/Doc/Code/QA) |
multiAgentRoster |
[openclaw, hermes, claude, codex, copilot, opencode, droid, goose, pi, pool, onyx, n8n] |
Controls which assistants are available in multi-agent routing |
permissions |
default |
default / auto-review / full-access / custom |
systemPrompt |
(empty) |
Used when permissions = custom |
autoPrefer |
local-first |
Auto routing strategy |
taskRoutingEnabled |
true |
Enable task-based model routing for Auto |
taskModelImage |
__auto__ |
Preferred model for image tasks |
taskModelDoc |
__auto__ |
Preferred model for doc tasks |
taskModelCode |
__auto__ |
Preferred model for code tasks |
taskModelQa |
__auto__ |
Preferred model for QA tasks |
modelAccuracyOverrides |
[] |
Manual accuracy labels shown in picker |
cloudPricePerMTok |
0.5 |
$/1M tokens, used for the savings counter |
openOnStartup |
false |
Focus the sidebar at startup |
composerEnterBehavior |
send |
send (Enter sends) or newline |
customProviders |
[] |
Array of {id,name,baseUrl,apiKey,models[]} |
Cost-savings playbook
- Default to Auto. It chooses local whenever possible.
- Pull
qwen3-coder:30b or llama3.1:8b — they cover ~80 % of everyday coding for free.
- Reserve
:cloud and BYOK models (Claude, GPT-4o, Gemini) for: long-context reasoning, vision, hard refactors.
- Watch the $ saved counter go up.
Troubleshooting
| Problem |
Fix |
Cannot reach Ollama at http://127.0.0.1:11434 |
Run ollama serve or open the Ollama menubar app. |
No models found |
ollama pull llama3.1:8b then click ↻. |
| Cloud model auth error |
ollama signin and verify Ollama Pro at https://ollama.com/settings/billing. |
| Doesn't appear in Antigravity |
Restart Antigravity after install. Speech-bubble icon in left activity bar. |
| Extension installs but engine version error |
This extension targets vscode ^1.95.0. Update Antigravity / Cursor / VSCodium. |
| BYOK provider returns 401 |
Wrong API key, or the baseUrl doesn't end at the /v1-style root. |
Roadmap
- 🛠 MCP client — connect Slack, Gmail, Drive, Calendar, Playwright (browser), shell — using the open Model Context Protocol ecosystem
- 🛠 Shell tool execution with the permissions preset above gating each call
- 🛠 Markdown / code-block rendering in the chat log
- 🛠 Per-workspace model defaults
PRs welcome.
Building from source
git clone https://github.com/robinbakshi007/ollama-direct-custom-agent
cd ollama-direct-custom-agent
npm install
npm run package
npx @vscode/vsce package
# → ollama-direct-custom-agent-0.7.5.vsix
code --install-extension ./ollama-direct-custom-agent-0.7.5.vsix
antigravity --install-extension ./ollama-direct-custom-agent-0.7.5.vsix
License
MIT — see LICENSE. Use it, fork it, ship it.
| |