GraQle — Architectural Code Memory for AI Agents
Stop guessing. Build on memory.
Your codebase becomes a contextual reasoning brain.
Every module is a memory map. Every dependency is a neural edge. Every mistake becomes a lesson the system remembers.
Built for regulated, enterprise, and life-critical software where hallucinations are unacceptable and every answer must survive governance and adversarial debate.

Quickstart — three commands, full governance
pip install graqle # 1. install the memory engine
graq gate-install # 2. activate the governance gate (non-optional)
graq scan repo . # 3. build your codebase into agentic memory
Then install this extension and open VS Code in your project.
Ctrl+Shift+G opens the chat panel. You're done.
Don't skip step 2. The gate is what turns AI from a guesser into a governed reasoner. One install. Reversible (rm .claude/hooks/graqle-gate.py).
GraQle vs nothing.
Other AI tools see files. They re-derive your architecture on every prompt. They guess, they hallucinate, and they have no way to remember what you've already taught them.
GraQle gives your AI agents a brain. A persistent, queryable, governed memory of your codebase that learns and adapts every time you use it. Modules are nodes. Dependencies are neural edges. Past mistakes become lessons the brain refuses to repeat. Multiple agents debate every answer before it reaches you.
This is not autocomplete. This is not a chat sidebar. This is not a single-model wrapper.
This is the agentic memory layer your IDE has been waiting for.
Why GraQle is in a category of one
| What you actually want |
What other tools do |
What GraQle does |
| The AI knows my codebase |
Re-read files every prompt → 30K tokens, no real understanding |
Architectural Code Memory built once, reasoned over forever — answers cite the actual nodes |
| Stop the AI from guessing |
Confident hallucinations with no escape valve |
Adversarial Debate Protocol — multiple agents debate every answer; consensus or pause |
| Catch ambiguity before it ships |
"I'll pick the first option that compiles" |
Ambiguity Pause — when two paths exist, you pick once, the brain remembers forever |
| Govern what the AI can do |
YOLO mode or hand-rolled allowlists |
Governance-First Gate — one install, every tool call routed through preflight + impact + lesson lookup |
| See the reasoning, not just the output |
"Generated by AI — trust me" |
Reasoning Timeline — every node activated, every gate hit, every confidence score, live in the chat panel |
| Run end-to-end agentic flows |
Chat → copy → paste → debug → repeat |
Continuous Flow — context → reason → generate → review → apply, with confidence gates and pause/resume |
| Privacy-first deployment |
"Local mode (with caveats)" |
3 memory backends + 14 BYOB models — Local JSON, self-hosted Neo4j, AWS Neptune. 100% local-capable. |
GraQle is not a Cursor competitor. Cursor is great. So is Claude Code, Codex, Copilot. GraQle is the memory + governance + debate layer that lets any of them finally reason over architecture instead of chunks.
AI assistants see files. GraQle sees architecture.
For production-critical code, hallucination is a bug
If your code controls money, health, infrastructure, regulated workflows, or anything that ships under audit — generic AI assistants are not enough. GraQle is built for the use cases where:
- Wrong answers are expensive. Every reasoning chain runs through the Adversarial Debate Protocol and Governance Gate before it touches your code.
- Auditability is non-negotiable. Every governed call is logged with caller, timestamp, decision, confidence, and the memory nodes it consulted.
- IP must stay yours. Three deployment options give you full residency control (Local JSON / Neo4j / Neptune), and 14 BYOB model providers mean you choose what your code touches.
- Memory must be durable. Lessons learned in one session persist for the next, even across restarts, even across sessions with different reasoners.
This is the layer regulated industries, enterprise teams, and security-sensitive shops have been waiting for.
Six flagship features
🧠 Architectural Code Memory
The codebase isn't a folder — it's a brain. Every module, function, class, config, env var, and runtime dependency becomes a memory node. Every relationship becomes a neural edge. Every past incident becomes a lesson the memory remembers and surfaces before you repeat it. The memory is built once and reasoned over forever — your AI stops guessing and starts building on what it already knows.
⚔️ Adversarial Debate Protocol
Every answer that reaches you has already survived a multi-agent debate. Candidate answers are cross-reviewed by independent agents who challenge each other's reasoning, surface contradictions, and either reach consensus or escalate. No single model gets the final word. This is how GraQle reduces hallucinations on hard reasoning tasks.
🛡️ Governance-First Gate
One install (graq gate-install) routes every native tool call (Read, Write, Edit, Bash) through a memory-aware equivalent that runs:
- Preflight check — surface relevant lessons, ADRs, and safety boundaries before the change
- Impact analysis — show which other modules this touches before the model writes a line
- Lesson lookup — has anyone broken this before? The memory remembers
- Audit log — every governed call is recorded, every approval is durable
Approve once. Governed forever. Reversible (rm .claude/hooks/graqle-gate.py).
⏸ Ambiguity Pause
When the model hits a real fork in the road — two plausible paths, neither dominant — GraQle doesn't guess and doesn't ask everything. It pauses with the actual options, shows you the rationale and confidence for each, and remembers your choice. Next time the same fork appears, the memory knows what you'd pick. The brain learns you.
📊 Reasoning Timeline
Every chat answer comes with the full reasoning trace inline: which nodes were activated, which gates were hit, which lessons surfaced, which debate rounds were needed. Click any chip to see the source. No more "trust the model" — you see exactly what it saw.
🔄 Continuous Flow
Run an entire feature implementation as a single command: context → reason → generate → review → apply. Each stage has its own confidence gate. If anything trips a gate or surfaces an ambiguity, the flow pauses, surfaces the issue, and waits for your input. Resume with one click. The memory learns from every pick.
Three memory backends — your residency, your choice
| Backend |
Best for |
Latency |
Privacy |
Cost |
| Local JSON (default) |
Solo devs, small-to-medium codebases, fully offline |
< 50 ms |
100% local, never leaves disk |
Free |
| Neo4j (self-hosted) |
Teams, large codebases, custom queries via Cypher |
< 20 ms |
Your infrastructure |
Free (community) or Aura plan |
| AWS Neptune (serverless) |
Enterprise, multi-region, compliance teams |
< 100 ms |
Your VPC |
Per-NCU |
The active backend is shown live in the status bar — click it to switch, see node count, residency, and latency in one panel. Your code never leaves a backend you didn't choose.
14 BYOB (Bring Your Own Backend) reasoners
GraQle reasons by routing model calls through a provider you already have. Run fully offline with Ollama. Run fully local with LM Studio. Or use any of: Anthropic, OpenAI, AWS Bedrock, Google Gemini, Groq, DeepSeek, Together, Mistral, OpenRouter, Fireworks, Cohere — plus a Custom-endpoint slot for self-hosted models.
Switch in one command (GraQle: Configure AI Backend). No vendor lock-in. No data forwarded anywhere unless you point it there.
Real example session
You: "Refactor the auth flow to use the new session middleware."
GraQle: ⚔️ Adversarial Debate (3 agents, 1 round)
→ Agent A proposes: move session creation into middleware layer
→ Agent B challenges: middleware is shared, this could regress legacy callers
→ Agent C synthesizes: scoped middleware adapter is the safe path
⏸ Ambiguity Pause — two viable paths:
A) Move session creation into the middleware layer
rationale: matches existing pattern in core/middleware.py
memory: 3 supporting nodes, 1 prior lesson on session lifecycle
B) Wrap the existing flow with a middleware adapter
rationale: less invasive, fewer touched files
memory: 1 supporting node, 0 lessons
[Choose for me]
You: *picks A*
GraQle: ✓ Choice recorded — your memory will recommend A next time
Continuous Flow resuming: reason → generate → review → apply
[Reasoning Timeline]
├─ context ✓ 47 nodes activated, 12ms
├─ debate ✓ 3 agents, consensus reached
├─ reason ✓ 3 candidates synthesized, 1 dominant
├─ generate ✓ 4 files patched, 142 lines
├─ review ✓ 0 BLOCKERs, 1 MAJOR (accepted)
└─ apply ✓ all changes written, tests green
Governance: ✅ approved (auto — no policy violations, no past lessons triggered)
Citations: src/auth/session.ts:42, src/middleware/index.ts:118
Memory: 1 new lesson recorded — "session-middleware refactor pattern"
This is what GraQle does that nothing else does.
Trust controls
- Data Residency Chip — live status bar showing where your memory is hosted right now (Local JSON / Neo4j / Neptune). Click to switch or audit.
- Governance Gate Status — green/yellow/red shield in the status bar. Yellow means an approval is pending; click to review.
- Audit log — every governed tool call is recorded with timestamp, caller, decision, and confidence. Inspect via
GraQle: Governance Gate Status.
- Reversible everything — disable the governance gate with one file delete. Switch backends with one command. Cancel any in-flight flow with one click.
- Cancel mid-flight — the cancel button actually cancels. Underlying MCP requests are aborted, not just hidden.
- IP-safe by design — GraQle's reasoning surfaces what the model decided, never the internal scoring weights or thresholds. Your IP stays yours.
What's new in v0.4.5
This release activates the full L3 governance layer thanks to graqle SDK v0.51.3 — the engine that ships every L3-c capability the extension was already wired for.
- Ambiguity Pause auto-enables when graqle SDK ≥ 0.51.3 is detected — no opt-in needed.
- Evidence chips on every option — pause cards show which memory nodes / lessons support each choice.
- Capability probe at MCP handshake — extension auto-detects what the SDK can do, never depends on hard-coded version sniffs.
- Removed: every text-parse fallback, every experimental opt-in flag, every workaround we shipped while waiting for the SDK.
- Hardened: panel race fixes, debounce-cancel race fixes, AbortSignal plumbing, smart-resume context-stage cache (saves cost when flows pause).
See CHANGELOG.md for the full v0.4.x history.
Pairing with Claude Code, Cursor, Codex
Run GraQle: Connect to Claude Code / Cursor from the palette. The extension writes the right .mcp.json (Claude Code) or .cursor/mcp.json (Cursor) config so your AI assistant of choice can call the same 130 GraQle MCP tools the chat panel uses.
This is the magic moment: GraQle isn't a competitor to your assistant — it's the memory layer it always wished it had.
Commands (22)
Open the Command Palette (Ctrl+Shift+P / Cmd+Shift+P) and type GraQle:.
| Group |
Command |
Default keybinding |
| Chat |
Open Chat |
Ctrl+Shift+G |
|
Ask about selection |
Ctrl+Shift+A (with selection) |
|
Welcome & Setup |
— |
| Reasoning |
Run Continuous Flow |
Ctrl+Shift+R |
|
Resume Continuous Flow |
— |
|
Run Background Agent |
Ctrl+Shift+L |
|
Cancel Background Agent |
— |
| Memory |
Scan Workspace |
— |
|
Show Knowledge Graph |
— |
|
Memory Backend (data residency) |
Ctrl+Shift+D |
|
Refresh KG Completions Cache |
— |
| Governance |
Preflight Check |
— |
|
Governance Gate Status |
— |
| Plan & usage |
Show Plan & Usage |
— |
|
Refresh Plan |
— |
|
Show Step Detail |
— |
|
Upgrade Plan |
— |
| Authentication |
Sign In |
— |
|
Sign Out |
— |
| Backend |
Configure AI Backend |
Ctrl+Shift+Alt+B |
| MCP |
Connect to Claude Code / Cursor |
— |
|
Restart MCP Server |
— |
Configuration
GraQle exposes 18 settings under graqle.* in VS Code preferences. The ones you'll likely touch:
graqle.llm.activeProvider — pick from 14 BYOB providers (default anthropic)
graqle.experimental.ambiguityPause — null (auto-detect, recommended), true (force-on), false (force-off)
graqle.completion.debounceMs — completion debounce window (default 400ms)
graqle.streamResponses — stream tokens live in chat (default on)
graqle.showProvenance — show which memory nodes contributed to each answer (default on)
graqle.showGovernance — show governance gate decisions inline (default on)
Full reference in VS Code Settings → search "GraQle".
How it works
Your codebase
↓
graq scan ← builds architectural code memory (one time)
↓
graq gate-install ← activates governance + adversarial debate
↓
graq mcp serve ← GraQle MCP server (130 tools, JSON-RPC 2.0)
↓
VS Code extension ← chat panel, completions, status, governance UI
↓
Continuous Flow ← context → debate → reason → generate → review → apply
All reasoning happens on your machine (or via the BYOB backend you chose). Your code never leaves your environment unless you explicitly route it to a cloud LLM. Even then, only what you send is sent — never the memory itself.
Pricing
- Free — every feature in this README. Local JSON backend. Ollama for offline reasoning. No signup.
- Pro — Neo4j backend support, larger memory limits, priority support
- Team — multi-user governance, shared lesson memory, audit export
- Enterprise — Neptune backend, SSO, dedicated support, on-prem deployment
Full pricing at graqle.com/pricing.
Known issues (v0.4.5)
- Continuous flow pauses re-execute non-context stages. The context stage result is cached between pauses to keep cost low; reason → generate → review → apply re-execute with your choice baked in. Smart resume from the paused stage is planned for v0.5.0.
- Python f-strings and TypeScript template literals suppress completions inside
${...} interpolation. A proper tokenizer ships in v0.4.5+.
- Terminal reaction loop is disabled because VS Code's
onDidWriteTerminalData is a proposed API that blocks activation outside Insiders builds. Will ship when the API stabilizes.
- Unbundled extension (~1.7 MB). Bundling with esbuild is planned for v0.5.0 to improve cold-start time.
Links