Skip to content
| Marketplace
Sign in
Visual Studio Code>Programming Languages>OKAI Code AssistantNew to Visual Studio Code? Get it now.
OKAI Code Assistant

OKAI Code Assistant

Okkar KP

| (0) | Free
Autonomous AI coding agent in your sidebar — assess, plan, implement, run, and verify code changes with a full ReAct loop. Local SLM by default, online models optional.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

OKAI Code Assistant — VS Code Extension

Backend CI

Autonomous AI coding agent inside VS Code. Assess, plan, implement, and verify code changes — directly in your sidebar, powered by a full ReAct loop with a local SLM by default and your choice of online model when you want one.

OKAI chat panel


What it does

🧭 Assess any codebase Ask "what does this project do?" — OKAI reads the source, lists components, and reports findings with file paths
🛠️ Implement features end-to-end "Add Redis caching to the order service" — it plans, writes real source code, runs the linter, and verifies it builds
🐛 Find & fix root causes Paste a stack trace — get a verified patch with the actual files modified
🚀 Run, build, deploy "Compile and start this app" — it executes the toolchain (mvn/npm/uvicorn/docker) on your shell, doesn't just print instructions
🔒 Local-first Embedded SLM (Qwen 2.5) runs in Docker — no cloud API key required to start
Plan tracking Tool execution
Instruction nodes Tool calls

🚀 Just installed from the Marketplace? Start here.

You have the extension. Now you need the OKAI backend running locally — it does the AI work and stores findings.

Step 1 — Install Docker Desktop (one-time)

If you don't have it: https://www.docker.com/products/docker-desktop/

Step 2 — Run the OKAI backend

docker run -d --name rag-okai -p 8000:8000 \
  -e NEO4J_PASSWORD=test12345 \
  okkarkp/rag:allinone

Step 3 — Verify (wait ~30s for the model to load)

curl http://localhost:8000/readyz

A 200 OK response means you're ready. Click the ⚡ OKAI bolt in the Activity Bar to open the chat — the status line will show OKAI: ready.

Tip: Run OKAI: Open Walkthrough from the command palette (Cmd+Shift+P / Ctrl+Shift+P) for the full guided setup including hosted-backend and build-from-source options.


⬇️ Alternative: Install via Docker (no Marketplace needed)

If you'd rather not use the Marketplace, the VSIX is bundled inside the Docker image — no GitHub or Marketplace access required.

Once the backend is running, download from:

http://localhost:8000/downloads/okai-latest.vsix
# One-liner install (backend must be running)
curl -fL http://localhost:8000/downloads/okai-latest.vsix -o okai-latest.vsix
code --install-extension okai-latest.vsix

If your backend is deployed on a different host, replace localhost:8000 with your server address.


❓ FAQ

Q: How do I install the OKAI extension in VS Code?
  1. Make sure the OKAI backend container is running:
    docker compose up -d
    
  2. Download the extension directly from the backend:
    curl -fL http://localhost:8000/downloads/okai-latest.vsix -o okai-latest.vsix
    
    Or open http://localhost:8000/downloads/okai-latest.vsix in a browser and save the file.
  3. Install it in VS Code:
    code --install-extension okai-latest.vsix
    
    Or: Cmd+Shift+P → Extensions: Install from VSIX… → select the downloaded file.
  4. Reload VS Code: Cmd+Shift+P → Developer: Reload Window
  5. Click the ⚡ OKAI bolt icon in the Activity Bar (left icon strip) to open the chat panel.
Q: What are the prerequisites?
  • VS Code 1.90 or later
  • Docker Desktop running on your machine
  • The OKAI backend started via Docker Compose:
    docker compose up -d
    
  • An LLM API key (OpenAI GPT-4.1 recommended) — set in VS Code settings under rag.llm.apiKey
Q: Where does the OKAI chat panel appear?

OKAI opens as a dedicated sidebar panel, like GitHub Copilot Chat or Claude Code.

  • Click the ⚡ bolt icon in the Activity Bar (left strip)
  • Or run Cmd+Shift+P → OKAI: Open Chat

The panel has two sections: Chat (the main conversation) and Findings (assessment results).

Q: What can I ask OKAI to do?

OKAI uses a full ReAct loop (assess → plan → implement → verify → report). You can ask it naturally:

What you type What OKAI does
add JWT authentication to all controllers Reads codebase, implements changes, lints, reports
I don't see [Authorize] on the controllers Detects missing attribute, adds it to all controllers
where are the controller methods? Reads the file, adds the missing CRUD methods
assess the security of my API Full code review with severity-classified findings
explain the repository layer Plain-English walkthrough of the pattern used
generate unit tests for BookService Creates test file with full coverage stubs

OKAI automatically detects whether your request requires implementation or just analysis — you don't need to use /commands for most tasks.

Q: What slash commands are available?
Command What it does
/generate <description> Generate new code (controller, service, entity…)
/fix <description> Implement a specific fix
/assess Full code quality assessment
/explain <file> Explain a file or function
/test <file> Generate unit tests
/findings List findings from the current session
/new Start a fresh session
/repo Switch to a GitHub/GitLab repository
/status Backend health and config

Slash commands are optional — typing naturally works for most requests.

Q: Which LLM models are supported?
Provider Models
OpenAI GPT-4.1, GPT-4o, GPT-4o-mini
Anthropic Claude Sonnet, Claude Haiku
Local (SLM) Qwen2.5-3B (bundled in Docker image)

Set in VS Code settings: rag.context.mode → cloud for GPT-4.1, local for the bundled SLM.

GPT-4.1 is recommended for best accuracy on complex codegen tasks.

Q: OKAI produced an assessment report instead of implementing — what happened?

This has been fixed in the current release. OKAI now:

  • Detects context from previous responses — if it just showed you findings, your follow-up ("add it", "implement it", "why can't you do it") automatically triggers implementation mode
  • Has an assessment-drift guard — if the agent tries to produce a findings report on a code-writing task, it's rejected and forced to call write_file instead
  • Extends the turn budget automatically when the drift guard fires, so the correction always executes

If you still see analysis-only output on a generate/fix request, try prefixing with /fix or /generate.

Q: Files are being written to the wrong directory — how do I fix this?

OKAI detects the project manifest (.csproj, pom.xml, package.json, etc.) and emits a write_root hint to the agent so files land in the correct subdirectory.

If files are still going to the wrong place, check that your project has a manifest file in the expected subdirectory and that VS Code's workspace root is set to the repo root (not the project subdirectory).

Q: The extension says "Cannot connect to backend" — what do I do?
  1. Check Docker is running: docker ps | grep rag-impact
  2. Check the backend is healthy: curl http://localhost:8000/health
  3. Verify the URL in VS Code settings: Cmd+Shift+P → Preferences: Open Settings → search rag.backend.url (default: http://localhost:8000)
  4. Restart the backend: docker compose restart
Q: The OKAI panel appears inside the Explorer sidebar instead of its own panel — how do I fix this?

This happens when VS Code has cached an old view layout from a previous version.

Fix: Right-click the CHAT section header in the Explorer sidebar → Move to OKAI.
Do the same for FINDINGS.

After moving, the views will always open inside the dedicated OKAI activity bar panel.

For a fresh install this issue does not occur — views default to the OKAI panel automatically.

--- The extension is a thin client to the OKAI analysis backend; all the heavy work (capability probe, pre-flight diagnostic, agentic troubleshoot loop, Rule 7 confidence scoring) runs server-side in Python.

Features

M1 — Read-only investigation

  • Right-click "Find Root Cause for this Stack Trace" on any selection that looks like an error log or stack trace
  • Findings tree view in the activity bar, grouped by HIGH ★ / TENTATIVE ◇ confidence
  • Problems panel + editor squigglies for every finding's affected file:line
  • Hover tooltip showing the full root_cause_chain (symptom → proximate → root)
  • Status bar badge with finding count and detected diagnostic tools (git, dotnet, mvn, tsc, ng, …)
  • Mark as false positive to remove a finding and report feedback to the backend
  • TENTATIVE findings auto-prefixed in the title with severity downgraded one level — the Rule 7 "I don't know" path keeps low-confidence guesses out of your face

M2 — Apply fixes

  • Lightbulb quick-fix on any line with a RAG diagnostic — choose "Generate fix for…", "Open all evidence files", or "Mark as false positive"
  • Generate and Apply Fix command (also on every tree-view finding) — calls the backend's fix generator and shows a side-by-side diff
  • Diff preview uses VS Code's native diff editor: original on the left, proposed content on the right, served from a virtual rag-fix: URI
  • Apply / Discard prompt — Apply uses WorkspaceEdit so the change goes through VS Code's undo stack; complex operations fall back to the backend's apply-fix writer
  • Verify-after-apply — the extension automatically re-runs the project diagnostic via /assess-agent/verify and surfaces:
    • ✓ "fix verified — diagnostic now clean"
    • ✗ "fix did not resolve the issue. Re-investigating may be needed."
  • Tree view fix state — every finding shows its lifecycle in the title: ⏳ generating → 📝 proposed → ▶ applied → ✓ verified / ✗ failed. Verified findings turn green; failed ones turn red.

Installation

📖 Full setup guide — Docker configuration, LLM providers, authentication, CI/CD integration: SETUP.md

The intended distribution is Docker-only — no Node, npm, or vsce required on your machine. Three paths in order of recommendation:

Option A — Download the prebuilt VSIX from GitHub Releases (easiest)

CI runs on every release tag and attaches the .vsix to the release.

# Replace v0.1.0 with the latest release tag from
# https://github.com/okkarkp/rag/releases
curl -fL -o rag-troubleshoot.vsix \
  "https://github.com/okkarkp/rag/releases/download/v0.1.0/rag-troubleshoot-0.1.0.vsix"
code --install-extension rag-troubleshoot.vsix

Then start the backend with the bundled compose (next section).

Option B — Build the VSIX inside a one-shot Docker container

When you can't reach GitHub Releases or want to build from a specific branch, use the bundled Dockerfile.vsix-builder — it requires nothing but Docker on the host.

git clone https://github.com/okkarkp/rag.git
cd rag

# Build the VSIX inside Docker, copy the .vsix to your host
docker build -t rag-vsix-builder \
  -f vscode-extension/Dockerfile.vsix-builder vscode-extension

docker run --rm -v "$PWD":/out rag-vsix-builder \
  sh -c "cp /work/*.vsix /out/"

code --install-extension rag-troubleshoot-*.vsix

The builder image is throwaway (docker rmi rag-vsix-builder after) — its only job is to produce the .vsix file. No Node or npm touches your host.

Option C — Backend hosted elsewhere (corporate / shared deployment)

If your team already runs the backend, just install the VSIX from Option A or B and point the extension at the deployment:

// ~/.vscode/settings.json or workspace settings
{
  "rag.backend.url": "https://rag.your-company.com",
  "rag.backend.apiKey": "<token-from-RAG-admin>"
}

Starting the backend (after VSIX is installed)

If you don't already have a backend deployment, run the slim compose bundled with the extension:

cp vscode-extension/.env.example vscode-extension/.env
# Edit .env — set ANTHROPIC_API_KEY (or OPENAI_API_KEY), pick RAG_PROJECTS_DIR

docker compose -f vscode-extension/docker-compose.yml up -d
# Or for a fully local LLM (no API keys needed):
docker compose -f vscode-extension/docker-compose.yml --profile local-llm up -d
docker exec rag-ollama ollama pull llama3.1:8b

The compose launches:

  • rag-backend (FastAPI on :8000) — the analysis engine
  • rag-ollama (Ollama on :11434, only with --profile local-llm) — bundled local LLM so you don't need API keys

Architecture — local execution + remote AI

┌──────────────────────────────┐         ┌────────────────────────────────┐
│    Developer's PC            │         │   Docker backend               │
│                              │         │                                │
│  VS Code + this extension    │         │   FastAPI on :8000             │
│   • probe local toolchain    │         │   • LLM / RAG / parsing        │
│   • run dotnet build, etc.   │  HTTPS  │   • Rule 7 confidence scoring  │
│   • parse compile errors     │ ──────► │   • Pre-flight diagnostic      │
│   • git blame / git log      │         │     (fallback when client      │
│   • use REAL SDK + cache     │ ◄────── │      doesn't ship signals)     │
│   • use REAL secrets / feeds │  SSE    │   • run_shell / run_diagnostic │
│                              │         │     (fallback only)            │
└──────────────────────────────┘         └────────────────────────────────┘
            ▲
            │
        Your code, your SDK, your env

The split is intentional:

  • Compile/typecheck/test commands run on the developer's PC because that's where the real .NET SDK / JDK / Maven cache / dotnet user-secrets / internal NuGet feeds live. The extension uses Node's child_process with strict argument tokenisation, no shell expansion, and a per-tool timeout.
  • AI/RAG/analysis runs in the Docker backend because that's where the LLM context, the lineage graph, and the troubleshoot agent live — no need to bundle every language SDK into a container image.
  • Compile errors flow as static_signals in the troubleshoot request, so the backend's LLM sees the bug stated in plain text from the developer's real environment in its very first prompt.

The rag.execution.mode setting controls this:

  • auto (default) — run locally if a toolchain is detected, fall back to the backend's run_diagnostic
  • local — always run on the developer's PC (refuse to fall back)
  • container — never run locally, only the backend's bundled tooling

Enterprise security and observability

This extension is built for production use, not a proof of concept. Every interaction with the backend and the user's filesystem is auditable, retryable, and credential-aware.

Credentials — system keychain

rag.backend.apiKey is stored in VS Code's SecretStorage, which writes to:

  • macOS: Keychain
  • Windows: Credential Manager
  • Linux: libsecret (gnome-keyring / KWallet)

The token is never persisted to plain settings.json. On first activation, any pre-existing plain-settings key is automatically migrated into SecretStorage and the plain entry is cleared. Use RAG: Sign In / RAG: Sign Out to manage credentials.

Request correlation

Every backend call carries an X-Request-Id header (UUID v4). The same id appears in the structured logs (RAG: Show Logs opens the LogOutputChannel) and in the backend's request logs, so support can correlate a user-reported issue to the exact session in seconds.

Audit log

Every meaningful action — session start/complete, finding emitted/displayed, fix generated/applied/verified/failed, false-positive mark, secret-scrub event — is appended to a JSONL file under VS Code's globalStorageUri. The file rotates at 10 MB (3 generations kept). Open with RAG: Show Audit Log.

Sample record:

{
  "timestamp": "2026-04-28T15:34:12.412Z",
  "action": "fix_verified",
  "finding_id": "abc",
  "files": ["src/Foo.cs"],
  "duration_ms": 54200
}

The audit log:

  • never contains secret values (only the names of patterns that triggered scrubbing)
  • never contains the API key
  • is human-readable for compliance review
  • survives across workspace switches (lives under user-global storage)

Secret + PII scrubbing

Findings can surface literal secrets when the LLM quotes from source files. Before any finding is shown in the diagnostics panel, hover, or tree view, every string field (title, description, code_snippet, root_cause_chain[], recommendation, fix_hint) is run through a regex scrubber that redacts:

  • OpenAI / Anthropic / Stripe / Slack / Google API keys
  • GitHub PATs (classic + fine-grained + app)
  • AWS access + secret keys
  • JWT tokens
  • password=… / secret:… assignments (key visible, value redacted)
  • Connection-string passwords
  • -----BEGIN PRIVATE KEY----- blocks
  • Authorization: Bearer … tokens

False positives are accepted (better to redact a long random string that wasn't a secret than surface a real one). The scrubber runs on the display copy only — the underlying file content the LLM reads is never modified.

Crash boundary

Every command handler is wrapped in withCrashBoundary(name, services, fn) so an unexpected exception:

  • is logged with the full stack via the structured logger
  • is recorded as a crash audit entry (handler name + error name/message + first 6 stack frames)
  • surfaces a "RAG: <handler> failed — <short message>" notification with a "Show Logs" button

This replaces VS Code's red "extension host crashed" banner — users see something actionable, support can correlate the crash with a request id, and the extension stays alive for the next command.

Bundle

The shipped .vsix contains a single 105 KB bundled file (out/extension.js) produced by esbuild, not the ~33 unbundled .js + .map files a plain tsc build would produce. This makes the VSIX 47% smaller and activation noticeably faster on cold start. Source maps stay enabled so stack traces still resolve to original .ts lines.

Persistent feedback queue

When you mark a finding as a false positive or a fix succeeds/fails, the feedback POST to the backend is durable: persisted to globalStorageUri/feedback-queue.jsonl, retried on activation, and after every minute of background flushing. Exponential backoff (1s → 5min) on transient failures, dropped after 8 attempts. Use RAG: Flush Pending Feedback Queue to force a flush.

Retry and backoff

Every non-streaming backend call (capabilities, feedback, apply-fix, etc.) retries up to 3 times on:

  • HTTP 429 (rate-limited)
  • HTTP 5xx (server errors)
  • network errors

Exponential backoff (500ms → 30s). 4xx responses (other than 429) are returned immediately as BackendError with a structured category (AUTH | RATE_LIMIT | SERVER | NETWORK | CLIENT | UNKNOWN).

Structured logging

RAG: Show Logs opens a VS Code LogOutputChannel with per-line metadata:

2026-04-28 15:34:12.412 [info] [api.troubleshoot] rid=a1b2c3d4 stream open {"project_path":"/repo","static_signals_count":3}

The user controls the level via the gear icon in the Output panel (trace / debug / info / warn / error). Set to "trace" when filing a support ticket.

Configuration

Setting Default What it does
rag.backend.url http://localhost:8000 Base URL of the FastAPI backend
rag.backend.apiKey "" Optional bearer token (use SecretStorage in production)
rag.troubleshoot.timeoutSeconds 7200 Wall-clock cap per session (2 h)
rag.troubleshoot.maxIterations 30 Max agentic iterations per session
rag.confidence.minDisplay TENTATIVE Hide findings below this confidence: HIGH / TENTATIVE / ALL
rag.diagnostics.severityMap (see package.json) How critical/high/medium/low map to VS Code severities
rag.execution.mode auto Where to run dotnet/mvn/tsc/etc. — local / container / auto

Usage

  1. Paste a stack trace into any editor (or open a log file).
  2. Select the trace text (multi-line is fine).
  3. Right-click → "RAG: Find Root Cause for Selected Stack Trace" (or Ctrl+Shift+P → same command).
  4. Enter a one-line problem description in the input box (e.g. "POST /orders returns 500 NullReferenceException").
  5. Watch the progress notification stream phase / tool / finding events.
  6. When done, the findings panel opens automatically; entries appear in Problems and as squigglies in the editor; hover for the full causal chain.

The status bar shows live progress while running and a summary (★3 ◇1 · git/dotnet) when idle. Click it to open the panel.

What the extension expects from the backend

  • POST /assess-agent/troubleshoot/stream — SSE endpoint emitting phase, tool_call, finding, done, error events
  • GET /assess-agent/capabilities?project_path=… — returns the host + project capability probe (git, dotnet, mvn, ng, tsc, eslint, …)
  • POST /assess-agent/finding/feedback — records "false positive" / "fix worked" feedback

These are all defined in backend/assess_agent_api.py and ship with the bundled Docker image.

Troubleshooting

  • "backend unreachable" in the status bar → check rag.backend.url; the default assumes the backend is on localhost:8000. If you used docker compose up, verify with curl http://localhost:8000/assess-agent/categories.
  • Empty findings panel after a session → check the Problems panel; some findings may have line=0 (unknown line) and only show in the tree view, not as squigglies.
  • All findings rejected with low confidence → the host may be missing diagnostic tools (dotnet / mvn / tsc). Check the status bar tooltip — if no tools are listed, run_diagnostic can't run and confidence scores stay low. Install the toolchain or accept lower-confidence findings via rag.confidence.minDisplay = "ALL".

Roadmap

This is the M1 release — read-only.

  • M2 — Apply fixes via diff preview + verify-after-apply loop calling run_diagnostic again ✓ shipped
  • M3 — Chat panel with /troubleshoot, /blame, /diff, /test, /diagnostic slash commands; inline ghost-text shortcuts for trace_call_chain
  • M4 — LSP server bridge so JetBrains, Neovim, Sublime can use the same backend

See docs/vscode-plugin-design.md in the parent repo for the full design.

Development workflow

Local dev requires Node 20 (the version VS Code 1.85+ ships with) and Docker is not needed for the extension itself — only for running the backend. From a fresh clone:

cd vscode-extension
npm install            # also installs the husky pre-commit hook
npm run compile        # tsc -p .  → out/
npm run lint           # ESLint with @typescript-eslint + security rules
npm run format         # Prettier auto-format every .ts / .json / .md
npm test               # Vitest unit suite (163 tests, ~2s)
npm run test:integration  # Real VS Code window via @vscode/test-electron
npm run package        # Build .vsix for distribution

Pre-commit hook

husky + lint-staged runs only on staged files so a typo in one module doesn't lint the whole repo:

git add src/foo.ts
git commit -m "..."
# → husky runs lint-staged →
#    eslint --fix src/foo.ts  (auto-fix what it can)
#    prettier --write src/foo.ts  (reformat to project style)
# → if eslint reports errors, commit is BLOCKED

To bypass (rarely needed): git commit --no-verify. CI re-runs the same checks regardless.

Editor setup

.editorconfig configures every editor (VS Code, JetBrains, Vim, Emacs) with our indentation/EOL/charset settings. VS Code users should also install the Prettier and ESLint extensions to get save-on-format and inline lint warnings — both respect our .prettierrc.json / .eslintrc.json automatically.

Lint policy

  • @typescript-eslint/recommended baseline + eslint-plugin-security for the regex-heavy modules
  • Errors block CI: bad equality, undefined globals, unsafe regex, etc.
  • Warnings allowed up to 50 (currently ~39, mostly no-floating-promises on vscode.window.show*Message calls — explicit void markers can be added incrementally)
  • Test files relax no-floating-promises (vitest fixtures throw synchronously)

Running tests

The extension has unit tests for its parser and store logic — run before any non-trivial change:

cd vscode-extension
npm install
npm test            # one-shot
npm run test:watch  # re-run on save

CI runs npm test before packaging the VSIX, so a failing test blocks release.

Coverage targets the most fragile code:

Module What's tested
src/api/client.ts (parseFrame) SSE frame parsing — comments, multi-line data, missing event, JSON vs raw text, whitespace trimming
src/api/client.ts (consumeNdjsonStream) NDJSON line splitting across chunk boundaries, malformed-line skip, trailing-line emission, blank-line handling
src/findingsStore.ts (FindingsStore) id assignment, byId/byConfidence/byFile, remove/clear/count, onChange firing, fixState lifecycle
src/findingsStore.ts (parseFindingLocation) C# (line,col), Java :line, root_cause_chain extraction, "line N" fallback, basename match

UI-touching code (commands, providers, status bar) is exercised through a minimal vscode shim at test/__mocks__/vscode.ts that stubs only the surfaces our code touches at module-load time.

Integration tests (real VS Code instance)

@vscode/test-electron launches a real VS Code window with the extension loaded and runs Mocha tests inside it. The full troubleshoot → diff → apply → verify flow is exercised against an in-process fastify mock backend so the tests need no external service.

cd vscode-extension
npm install
npm run test:integration   # downloads VS Code if not cached, runs the suite

Coverage:

File Scenario
test-integration/suite/activation.test.ts extension activates, all 16 commands registered
test-integration/suite/troubleshoot.test.ts stack trace selection → SSE finding → store + Problems panel + scrub + audit
test-integration/suite/applyFix.test.ts generate fix → diff preview → WorkspaceEdit apply → verify pass → file modified
test-integration/suite/rollback.test.ts apply-fix returns errors → AtomicApply restores original content
test-integration/suite/conflict.test.ts file edited between fix-gen and apply → conflict modal → user cancels → edit preserved

Each test starts a fresh fastify mock on an ephemeral port (mockBackend.ts) and points rag.backend.url at it. Tests inspect state through the Services container that activate() returns.

CI requirement: Linux runners need an X server. The CI step is:

- name: Run integration tests
  run: |
    sudo apt-get install -y xvfb
    xvfb-run -a npm run test:integration
  working-directory: vscode-extension

macOS and Windows runners run npm run test:integration directly — no Xvfb needed.


Enterprise Readiness Checklist

The one-click install link below is only published once every item below is ✅. This protects users from installing an unstable build into production environments.

✅ Done

Area Status Detail
Streaming chat (@okai) ✅ NDJSON /chat/message endpoint, per-turn session memory
Backend onboarding ✅ Auto-detects Docker URL, prompts + validates if unreachable
Credential security ✅ SecretStorage (OS keychain), never in settings.json
Secret scrubbing ✅ 12 pattern classes redacted before any finding is displayed
Audit log ✅ JSONL under globalStorageUri, 10 MB rotation, 3 gens
Retry + backoff ✅ 3 retries, exponential 500 ms–30 s on 429/5xx/network
Request correlation ✅ X-Request-Id on every call, surfaced in logs
Crash boundary ✅ Every command handler wrapped; red banner → actionable notification
Structured logging ✅ LogOutputChannel, user-controlled level
Feedback queue ✅ Durable JSONL, retried on activation + every 60 s
Atomic apply + rollback ✅ Backup → write → chmod-restore; full rollback on any failure
Unit tests ✅ 232 tests, 0 failures
Integration tests ✅ 8 real VS Code window tests, 0 failures
OKAI branding ✅ All RAG → OKAI in UI strings
/new session command ✅ Clears workspaceState session ID

🔲 Required before public download link is live

Area Status What's needed
GitHub Release published 🔲 Tag v0.1.0, attach rag-troubleshoot-0.1.0.vsix as release asset
CI auto-build on release tag 🔲 GitHub Actions workflow: npm run package → upload to release
Smoke test on fresh machine 🔲 Install via install.sh on a clean macOS + Windows VM
Backend Docker image published 🔲 docker pull okkarkp/rag-backend:latest must work
Rate-limit auth header on backend 🔲 API key validation in FastAPI middleware
Walkthrough / first-run guide 🔲 VS Code Walkthrough contribution complete
CHANGELOG.md 🔲 Document v0.1.0 release notes

Once all 🔲 items are checked off, uncomment the install section below.


Q&A

How do I get started from scratch?

Complete setup in 4 steps:

Step 1 — Start the backend

Pull and run the all-in-one Docker image (Neo4j + SLM + API + React UI in one container):

# Create a docker-compose.allinone.yml (or download from the repo):
docker pull okkarkp/rag:allinone

# Or with docker compose (recommended — handles volumes and restarts):
curl -fsSL https://raw.githubusercontent.com/okkarkp/rag/main/docker-compose.allinone.yml \
  -o docker-compose.allinone.yml

docker compose -f docker-compose.allinone.yml up -d

Wait ~60 seconds for the container to be healthy, then verify:

curl http://localhost:8000/readyz
# → {"status":"ok"}

You can also open http://localhost:8000 in a browser to see the React UI.

Step 2 — Download the VS Code extension

The .vsix is bundled inside the running container — no GitHub account or internet needed:

curl -fL http://localhost:8000/downloads/okai-latest.vsix -o okai-latest.vsix

Or open http://localhost:8000/downloads/okai-latest.vsix in a browser and save the file.

Step 3 — Install the extension

code --install-extension okai-latest.vsix

Or: Cmd+Shift+P → Extensions: Install from VSIX… → select the file.

Then reload VS Code: Cmd+Shift+P → Developer: Reload Window

Step 4 — Open the OKAI chat panel

  • Press Cmd+Shift+O (Mac) / Ctrl+Shift+O (Windows/Linux)
  • Or: Cmd+Shift+P → OKAI: Open OKAI Chat
  • Or: click the ⚡ icon in the VS Code activity bar

You're ready to go.

What should I try first?

Open the OKAI Chat panel (Cmd+Shift+O) and try these prompts in order:

1. Generate a new controller (end-to-end EFCore):

/generate create a ProductController with full CRUD using EFCore and async/await

OKAI will explore your project, create the entity, update DataContext, write the controller, and run dotnet build to verify.

2. Assess your code quality:

/assess Controllers/ProductController.cs

Returns severity-classified findings (Critical / High / Medium / Low) with file:line references. Findings also appear in the VS Code Problems panel as squigglies.

3. Explain a file:

/explain Controllers/ProductController.cs

4. Generate unit tests:

/test Controllers/ProductController.cs

5. Ask a freeform question:

/ask what design pattern is used in this project?

6. See findings and fix one:

/findings
/fix 1

/fix 1 opens a side-by-side diff editor — review the proposed change, then click Apply or Discard.

What does the chat UI look like?

The OKAI Chat panel is a standalone AI chat UI — no GitHub Copilot required, no Copilot subscription needed.


Message layout

Messages use an avatar-based layout:

  • Your messages — right-aligned bubble with a U avatar
  • OKAI responses — left-aligned with the ⚡ avatar and an OKAI label

Thinking sections

Before each tool call, the agent's reasoning appears as an expanded "Thought for Xs ›" section — automatically open so you can read the chain of thought immediately. The timer shows how long the SLM spent generating. Click the header to collapse it.

Example:

● Thought for 26s ›
  I need to check whether the Student entity exists before
  creating the controller. I'll list the Entities folder first...

Live tool timeline (while the agent runs)

Each tool call appears inline as it executes, with a colour-coded dot:

Dot Type Tools
🔵 Read read_file, list_files, find_files
🟢 Write write_file, patch_file, create_file
🟡 Run run_command, run_tests, run_linter
🟣 Search grep_code, search_code
🟠 Git git_diff, git_log, git_status

After the response completes

Three things appear below the response text:

  1. ▶ N tool calls used — collapsed list of every tool the agent called. Click to expand.
  2. ✅ filename chips — one chip per file the agent wrote. Click any chip to open the file in the editor.
  3. Task Summary table — shown when files were written:
File Action Status
StudentManagementController.cs Written ✓ Build passed

The summary also shows tool-call counts by type (Read × 4, Write × 1, Run × 3).


Slash commands (with autocomplete)

Type / to open the autocomplete dropdown:

Command What it does
/generate Generate new code — controller, service, entity, tests…
/assess Code quality assessment with severity findings
/explain Plain-English explanation of a file or function
/fix Autonomous fix with diff preview
/test Generate unit tests
/refactor Refactor selected code
/ask Freeform question about the codebase
/findings List all findings from the current session
/fix N Open diff editor for finding #N
/fp N Mark finding #N as false positive
/repo Switch to a connected GitHub/GitLab repo
/new Reset session ID and start a fresh conversation
/status Show backend health

Full feature list

Feature Detail
Streaming Token-by-token output with blinking cursor
Reasoning sections "Thought for Xs ›" opens automatically with full SLM chain-of-thought
Tool timeline Live colour-coded rows while running; collapses to summary after
Task summary table File/Action/Status + tool-count breakdown after each completed task
File chips Clickable ✅ chip for each file written — opens in editor
Findings panel /findings lists all findings; click to jump to file:line
Fix & false-positive /fix N diff editor; /fp N removes finding
Diagnostics Findings → VS Code Problems panel with editor squigglies
Abort ■ Stop button cancels mid-run
Auto-context Injects active file + selected text automatically
Copy code Copy button on every code block
Syntax highlighting C#, Python, TypeScript, JavaScript, Java, Go
History Last 100 messages persist across panel close/reopen
Repo switching GitHub/GitLab repo via /repo QuickPick
Theme-aware Respects VS Code light / dark / high-contrast themes
Do I need GitHub Copilot to use OKAI?

No. OKAI has its own standalone chat panel that is completely independent of GitHub Copilot. The panel works whether Copilot is installed, logged in, or not present at all.

Open it with Cmd+Shift+O (Mac) / Ctrl+Shift+O (Windows/Linux).

Which Docker image should I use?

There is one image that contains everything:

docker pull okkarkp/rag:allinone

It bundles:

  • Neo4j (graph database for code relationships)
  • SLM (local Qwen2.5 model via llama.cpp — no GPU needed)
  • Backend (FastAPI on port 8000)
  • React UI (served at http://localhost:8000)
  • VSIX (downloadable at http://localhost:8000/downloads/okai-latest.vsix)

The recommended way to run it is with the compose file so volumes and environment variables are handled correctly:

docker compose -f docker-compose.allinone.yml up -d

To stop:

docker compose -f docker-compose.allinone.yml down

To see logs:

docker compose -f docker-compose.allinone.yml logs -f
The extension says "Connection lost" or "fetch failed" — what do I do?

This usually means the backend container was restarting when you sent the message. Steps to recover:

  1. Check the container is running: docker ps | grep rag-impact
  2. Check it is healthy: curl http://localhost:8000/readyz
  3. If it's still starting, wait 30 seconds and try again
  4. Start a new session in the chat panel: type /new

If the container is not running at all:

docker compose -f docker-compose.allinone.yml up -d
How do I update OKAI to a newer version?

Step 1 — Pull the new image:

docker pull okkarkp/rag:allinone
docker compose -f docker-compose.allinone.yml down
docker compose -f docker-compose.allinone.yml up -d

Step 2 — Re-download and reinstall the extension (the VSIX inside the image may have been updated):

curl -fL http://localhost:8000/downloads/okai-latest.vsix -o okai-latest.vsix
code --install-extension okai-latest.vsix --force

Then reload VS Code: Cmd+Shift+P → Developer: Reload Window

What LLM does the backend use?

By default the all-in-one image uses the bundled local SLM (Qwen2.5 via llama.cpp) — no API keys required, no data leaves your machine.

You can point it at an external LLM by setting LOCAL_LLM_URL in your .env file or in the compose environment:

# .env
LOCAL_LLM_URL=http://192.168.0.17:14336   # e.g. your own llama-server with a 7B model

The backend auto-detects the model's context window size and scales the token budget accordingly.

Does it work without an internet connection?

Yes — once the image is pulled. The extension only calls your local Docker backend (http://localhost:8000). The SLM runs entirely inside the container. No telemetry, no external analytics, nothing leaves your network.

What does OKAI send to the backend?

Only what you explicitly ask it to analyse:

  • The text you typed in the chat panel
  • The selected text from your active editor (if any)
  • The workspace folder path (for the capability probe)
  • A session_id UUID (generated locally, never tied to your identity)

No file contents are sent unless you paste or select them. The backend runs entirely in your Docker environment.

Why isn't OKAI on the VS Code Marketplace?

OKAI requires a self-hosted backend so your code never leaves your infrastructure. Marketplace extensions are expected to call a cloud service. The VSIX + Docker Compose model gives teams full data sovereignty with zero external dependencies.

What LLMs does the backend support?

Configure via .env before docker compose up:

Variable Provider
ANTHROPIC_API_KEY Claude 3.5 Sonnet / Haiku
OPENAI_API_KEY GPT-4o / GPT-4o-mini
(none needed) Ollama local LLM (--profile local-llm)

The --profile local-llm Docker Compose profile pulls Ollama and you can run llama3.1:8b, mistral, or any model from ollama.com/library — no API keys required.

Something went wrong — how do I get logs?
  • Extension logs: Cmd+Shift+P → OKAI: Show Logs → set level to Trace
  • Backend logs: docker compose logs -f rag-backend
  • Audit log: Cmd+Shift+P → OKAI: Show Audit Log

Include the X-Request-Id from the extension log when filing an issue — it cross-references the exact request in the backend logs.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft