Dabbler AI Orchestration
An AI-led coding-session workflow for VS Code. Manage structured AI
sessions, automatic cross-provider verification, cost tracking, and
git-worktree-aware session-set state — all from the activity bar, in
two tiers that let you trade API spend against your own attention.

Two tiers — pay with dollars or pay with attention
Both tiers run the same workflow: the same session lifecycle, the
same Session Set Explorer, the same state files and close-out gates.
They differ in how much of the workflow is automated — and therefore
in what each one costs you:
|
Full tier |
Lightweight tier |
| Verification |
Automatic at every session end — the router picks a model from a different provider and runs the review for you |
Copyable review prompts — you paste them into a second AI chat and record the verdict yourself |
| API spend |
Metered, capped by your not-to-exceed budget (a 3-session set typically totals $0.15–$2.50) |
$0 — the router makes no API calls |
| Your attention |
Mostly at session boundaries. Run several projects at once while you sit in meetings, answer email, or do other work — the workflow carries itself between check-ins |
More hands-on: you drive each verification, so multitasking is more constrained. More than one project at a time still works — each just needs more of you |
| Best for |
Parallel project streams; "start it, check in later" operation |
Cost-sensitive work; learning the workflow; environments where API spend isn't an option |
The tier is declared per session set (tier: in the spec), not
per repo — mix them freely in one workspace, and switch a not-started
set's tier from the Explorer (Switch Tier…). A common path: start
Lightweight, then go Full when the attention cost outweighs the API
cost.
What you get
A standardized, largely automated workflow — not just better
chat hygiene. Most developers already split AI work into
sessions; the hard part is everything around them. This extension
operationalizes a high-level plan into session sets — ordered
sequences of AI-led work sessions that you and the AI co-design
before any code is written — and then runs every session through
the same structured lifecycle: register, work, verify, document,
commit, close. You direct the work; the workflow carries it. The
feeling is less "hands on the wheel" and more "telling your
chauffeur where to go next."
Ongoing visibility into AI work. Every session leaves an
AI-generated paper trail in predictable places — the spec, an
activity log of every step, per-session state with verification
verdicts, a change log at close. The Session Set Explorer reads it
all back at a glance: what's in flight, what's queued, what's
blocked on prerequisites, what's done and verified. You can step
away and know exactly what happened while you weren't watching.
Cost-minded routing (Full tier). On the Full tier, reasoning
tasks (code review, analysis, documentation, end-of-session
verification) go through the AI router, which picks the cheapest
capable model per task and escalates only when needed. Real
projects we tested measured 73% savings vs Opus-only on a
CLI/library project (990 routed calls) and 32% savings on a UI
app with UAT/E2E gates (370 calls). Two sample reports ship in the
GitHub repo.
The Lightweight tier skips the router entirely — that's the $0
column in the table above.
Cross-provider verification at every session close. On the
Full tier it's automatic: a model from a different provider than
the one that did the work reviews the session and returns a
structured verdict; disagreements surface for human adjudication
rather than being silently merged or dismissed. On the Lightweight
tier the same step is a copyable review prompt you paste into a
second AI chat and a verdict you record yourself.
Get started
Open a project folder with no session sets yet and the Session Set
Explorer renders the staged Getting Started form, with companion
step-by-step instructions in the editor:

- Build project structure — pick your tier (Full or
Lightweight, the cost/attention tradeoff described above) and the
form scaffolds everything: the
.venv with the router package,
the AI-agent instruction files, and the docs/session-sets/
home. On the Full tier the form also asks for your verification
budget / NTE cap (saved to ai_router/budget.yaml; a $0
budget asks you to pick manual-via-other-engine or skipped
verification explicitly) and warns inline when no provider API
key is visible.
- Create or import a project plan — import an existing
project-plan.md, or copy a planning prompt and let your AI
agent draft the plan with you.
- Build session sets — copy the decomposition prompt; your AI
agent turns the plan into ordered session sets under
docs/session-sets/, each with a spec you review before any work
starts.
Then tell your AI agent: "start the next session." Once the
first session set exists, the form gives way to the standard
Explorer tree. (You can re-focus the form anytime with
Dabbler: Get Started from the command palette.)
What it'll cost
API spend is real and varies by project size and verification
appetite. Honest framing:
- $0 budget — verification routes through a different AI
assistant you open manually (e.g. open a second AI chat as the
verifier), or you skip verification with the decision logged. No
API spend.
- Non-zero budget — the router makes synchronous API calls for
cross-provider verification, capped at your not-to-exceed (NTE)
threshold. Verification calls typically run $0.05–$0.80 each;
a 3-session set usually totals $0.15–$2.50; a 6-session set
$0.30–$5.00. These are empirical medians — outliers exist.
The router writes one JSON line per call to
ai_router/router-metrics.jsonl so you can audit spend at any
time. The Cost Dashboard command surfaces cumulative spend
visually — it appears only in workspaces that actually route (it is
absent on Lightweight) and, on open, prompts you to refresh the
per-provider rate estimates if they have gone stale (older than
metadata.review_frequency_days, default 30 days). python -m ai_router.report produces a full markdown
manager-report with the Opus-baseline savings headline,
per-task-type unreliability rates, and auto-generated action
items. The framework is open-source (MIT) — your costs are entirely
your provider's API spend; nothing in this extension is paywalled.
Requirements
- VS Code 1.85+
- Python 3.10+ with a workspace
.venv/ (the
Dabbler: Install ai-router command auto-detects or creates
it for you)
- API keys as environment variables:
DABBLER_ANTHROPIC_API_KEY (Claude Sonnet, Opus)
DABBLER_GEMINI_API_KEY (Gemini Flash, Pro)
DABBLER_OPENAI_API_KEY (GPT-5.4, GPT-5.4 Mini)
- All three are required so cross-provider verification has
somewhere to route to.
- These variables hold the normal provider-issued keys from Anthropic,
Google, and OpenAI; Dabbler only prefixes the environment variable names.
- One orchestrator AI agent installed as a VS Code extension
(Claude Code, Codex/GitHub Copilot, or Gemini Code Assist — the
framework is agent-agnostic and supports switching mid-set).
Optional: PUSHOVER_API_KEY + PUSHOVER_USER_KEY for
end-of-session phone notifications.
Sign-up links and a full prerequisites checklist live in the
GitHub repo's README.
Other features
Row interactions. Left-click a session-set row to open its
spec.md in an editor tab; on non-terminal rows (in-progress or
not-started) the click also copies Start the next session of \`.` to your clipboard with a one-line confirmation toast,
so you can paste straight into the AI chat and resume work in two
keystrokes. Right-click opens a native VS Code QuickPick with
two-step submenus: Open File ▸ (Spec / Activity Log / Change
Log / Session State), Copy Eval ▸ (copyable prompts —
Evaluate Specification / Most Recent Session / Session Set /
Start Next Session / Start New Parallel Session / Verification
Kickoff), and flat actions for Copy Slug, Open Orchestrator
Writer Log, Open Prerequisite Spec (on blocked rows), Switch
Tier… (not-started rows), Set Up Dedicated Verification… and
Open External Verification Note (eligible Lightweight rows),
Migrate to v4 schema, Cancel set, and Restore set. The
right-click menu honors light/dark theme natively and dismisses
on Escape or click-outside.
Copyable review prompts. Four Dabbler: Copy … commands
(also under Copy Eval ▸ in the right-click menu) author review
prompts that reference your session-set artifacts by path rather
than embedding their contents, then write to the clipboard. Paste
into any path-aware AI chat (Claude Code, Codex, Cline, Cursor,
etc.) and the agent reads the files itself. Optional per-repo
files at docs/review-criteria/{spec,session,set}.md override
the default review instructions if present.
Lightweight tier (no API spend). Run
python -m ai_router.start_session … --no-router, or set
tier: lightweight in spec.md, or DABBLER_NO_ROUTER=1 in
your environment. The router stops making LLM calls (no
credentials needed), close_session accepts a manual
attestation, and the soft gate prompts when an
external-verification.md artifact is missing
(Dabbler: Open External Verification Document creates or opens
it). Same Session Set Explorer, same session-state.json
lifecycle, same close-out gates — just no API spend on
verification.
Lightweight verification at a glance. Lightweight rows carry a
quiet lw marker, and sets using dedicated verification sessions
show an honest N/M+ fraction (the + says the session count can
still grow). Two verification-posture markers appear at the
actionable moment: v? on a completed out-of-band set the
Explorer cannot vouch for, and v+ when the work is done and a
dedicated verification session is owed. Clicking a marker opens
the row menu — Verification Kickoff copies a paste-ready
handoff prompt that has a different AI engine run the typed
verification/remediation flow, and Set Up Dedicated
Verification… switches a set's verificationMode safely (a
spec-seed rewrite on not-started sets; a recorded, gated
transition through the ai_router blessed writer on completed
sets). Verified sets stay quiet — the verdict lives in the
fraction tooltip, and no positive badge is shown.
Schema-v4 migrator + prerequisites. Set 047 introduced the v4
session-state.json shape where every per-session lifecycle field
(orchestrator, startedAt, completedAt, verdict) lives in a
per-session sessions[] ledger. The Migrate to v4 schema
right-click action (also python -m ai_router.migrate_v3_to_v4)
upgrades v1/v2/v3 state files with a .bak.json rollback
contract. Lightweight consumers with hand-edited shapes can run
python -m ai_router.migrate_lightweight_to_canonical_v4.
Specs can declare a prerequisites: field listing other session-
set slugs — see Prerequisites and the blocked marker below.
Prerequisites and the blocked marker — declare dependencies in
a set's spec.md to block it until other sets are complete:
prerequisites:
- slug: 047-state-file-schema-v4-audit
condition: complete
The Explorer shows a quiet chain marker (⛓︎) on blocked sets.
Hover the marker for a tooltip listing each unsatisfied
prerequisite and its current state ("in progress", "not started",
or "unknown set — check the slug" for a slug that doesn't match
any set; typos keep the row blocked rather than silently
unblocking it). The marker is hidden on complete/cancelled sets.
A right-click action "Open Prerequisite Spec" jumps straight to
the blocking dependency's spec — when more than one prerequisite
is unsatisfied, a QuickPick lists them with their states.
Visual config editor (Dabbler: Open Dabbler Config Editor) —
edit router-config.yaml, budget.yaml, and the gitignored
local-overrides.yaml through a six-section panel without touching
YAML directly. Sections cover routing mode, budget threshold,
provider API-key env vars, significance flagging, Pushover
notifications, and a local-overrides summary. Includes a
live-validation drift banner and a "Send a test notification" button.
Significance flagging — Dabbler: Flag Decision for Cross-Provider Review appends a one-line reason to the active set's review queue.
Dabbler: Scan Workspace for @dabbler:outsource-review Annotations
walks source files for # @dabbler:outsource-review("...") and
// @dabbler:outsource-review("...") annotations and queues new
findings automatically.
Cancel/Restore lifecycle — cancel a session set mid-stream
with a recorded reason; restore later if priorities shift. The
audit trail accumulates across cycles.
UAT checklist integration (tri-state). Specs declare
requiresUAT and requiresE2E as true | false | "suggested".
When the value is "suggested" and the session has UX scope, the
orchestrator asks at session start which review path you want
(E2E tests, UAT checklist, both, or neither) and records your
choice once; close-out gates derive from that recorded answer.
UAT checklists pair with the freely-available
UAT checklist editor.
Worktree auto-discovery — parallel session sets running in
sibling git worktrees show up in the activity-bar tree even when
the worktree isn't open as a separate workspace folder.
Learn more
License
MIT. Copyright © 2026 darndestdabbler.
| |