ShadowGraph: A Semantic Knowledge Graph for AI-Driven Development
Store code intent in the graph where it lives, not in comments that disappear.
ShadowGraph is a persistent knowledge graph for long-running AI agent development. It captures why code exists, what constraints it has, and how it connects to other parts of the system—enabling agents to understand context without reading thousands of lines of code or forgetting what you told them last session.
🤔 The Problem: Stateless Agents = Expensive Hallucinations
When building or maintaining a codebase over weeks with AI agents:
- Lost Context. Each new conversation, agents start from scratch. You re-explain the same architecture decisions, business rules, and tradeoffs.
- Expensive Tokens. You copy-paste code into prompts to restore context. A 100k codebase with an agent? That's thousands of wasted tokens per query.
- Hallucinations. Without understanding why code exists, agents make assumptions and introduce bugs.
- Fragile Comments. Even with inline documentation, comments rot, get deleted by refactors, and can't be queried.
The Core Problem: Code execution is a graph of dependencies and calls. But documentation is linear—written in comments alongside code, isolated from the structure they describe.
ShadowGraph mirrors the structure of your code with a knowledge graph:
- Code = Nodes (functions, classes, files)
- Dependencies = Edges (calls, imports, references)
- Intent = Linked Thoughts (business rules, constraints, design decisions)
Agents can now intuitively navigate and query this structure:
Agent: "Why does process_payment need to be idempotent?"
ShadowGraph:
└─ Finds: function:payment.process_payment
└─ Returns: Linked thought: "Stripe webhook can retry. Must be safe."
└─ Also returns: Business rule: "Payments are always national DPD shipments from Berlin hub"
└─ Total tokens: ~100 (vs. 5000 for reading 20 files)
The Benefit: Agents can recall exactly what they need without hallucinating or bloating the context window.
✨ How It Works
1. Semantic Storage — Not Line Numbers
- Thoughts are anchored to code via AST hashes, not line numbers
- Move a function → the thought moves with it
- Rename a class → the thought still finds it
- Change the logic → the thought is marked STALE and the agent is warned
remember(topic, context, file_path?, symbol_name?) — Save business rules, design decisions, constraints
recall(query) — Query what you know about a symbol, business rule, or topic
index(file_path) — Parse a file and register its symbols in the graph
check(file_path?) — Detect stale thoughts when code changes
create_file(path, content) — Write code to disk AND auto-register it in the graph
debug_info() — Diagnostic info for troubleshooting
Simple, intentional verbs. No confusion.
3. Team Knowledge — Git-Tracked Thoughts
Thoughts are saved in .shadow/graph.jsonl:
- Commit them to Git alongside your code
- Teammates pull and instantly inherit your context
- No onboarding questions. The knowledge lives in the repo.
🚀 Getting Started
Install
# Install from VS Code Marketplace
# (or build from source: npm run build)
Quick Example
# Agent saves business knowledge
remember("shipping", "All shipments are national DPD from our Berlin hub. Tracking: 123.22.123.1:7534/parcels")
# Agent indexes a new file they wrote
index("src/shipping/parcel_tracker.py")
# Agent recalls context before making changes
recall("shipping")
# Returns: Business rule about DPD + Berlin hub + tracking API
# Agent modifies code and re-indexes
# ... code changes ...
index("src/shipping/parcel_tracker.py")
# Agent checks for stale thoughts
check("src/shipping/parcel_tracker.py")
# Returns: "All anchors valid, no stale thoughts"
🎯 Why This Matters for Agent-Driven Development
Traditional Workflow:
- Agent reads 50 files → 10k tokens just for context
- Agent guesses at design intent → introduces bugs
- Next session, agent forgets everything → repeat
ShadowGraph Workflow:
- Agent queries graph for 1 symbol → 200 tokens of exact context
- Agent understands why code exists → makes intelligent changes
- Graph persists across sessions → agent remembers everything
Cost: ~5% of the context tokens. Knowledge that doesn't disappear.
🔬 Architecture
- Extension: VS Code (TypeScript) — displays codelens, decorations, commands
- MCP Server: Python (FastMCP) — parses code, manages graph, serves queries
- Storage: SQLite — local, portable, easy to version control
- Parsing: tree-sitter — multi-language AST support (Python, TypeScript, JS, Go, Rust, etc.)
License
MIT. Use this in your projects, in teams, in AI workflows. The code is yours.
Built for environments where agents maintain and evolve codebases over weeks or months.
Where context is everything, and tokens are expensive.