TinyFalcon

Your data, your compute. A local-first agentic coding assistant for VS Code.

TinyFalcon plans tasks, edits files, runs terminal commands, reads the results, and iterates — all driven by a local LLM served by Ollama. Designed from the ground up for small local models (7B–32B), with FSM-enforced safety gates that keep the agent honest on hardware that doesn't run frontier models.

For developers in regulated industries, air-gapped environments, or anyone who wants AI assistance without streaming their codebase to a third party.

⚠️ Requires Ollama (free)

Before TinyFalcon can do anything, you need:

Ollama installed and running (ollama serve).
At least one model pulled. Start with qwen2.5-coder:7b:
```
ollama pull qwen2.5-coder:7b
```

If Ollama isn't running when you send your first prompt, TinyFalcon shows a one-click error pane with the install link and copy-buttons for the commands above — but it's faster if you set them up first.

Install

Install Ollama and run ollama serve.
Pull a model: ollama pull qwen2.5-coder:7b.
Install TinyFalcon from the VS Code Marketplace (this listing).
Click the TinyFalcon icon in the activity bar.
Send your first prompt — TinyFalcon will plan, write code, run it, and iterate until it's done.

Screenshots

TinyFalcon writing and testing code via the agent loop

A typical TinyFalcon run: the agent wrote a binary-tree inversion program (write_file), ran the test suite (run_terminal), and confirmed everything passed — all from a single prompt.

What makes TinyFalcon different

Other agentic VS Code tools (Copilot, Cursor, Cline's cloud mode) need to stream your code to a third party. Local-first alternatives (Cline + Ollama, Continue, Aider) were built around cloud frontier models and retrofitted for local — pointed at a 7B–32B model they tend to malformed-tool-call, infinite-loop, or give up early.

TinyFalcon goes the other direction. The agent loop, tool schemas, and recovery logic are all designed for small local models on developer-grade hardware:

Deterministic safety gates. The FSM blocks success-claiming replies after failures and identical tool retries — fixes the two most common small-model failure modes without depending on the model to "behave".
Status-led tool results. Tool results lead with a clear SUCCEEDED / FAILED verdict so the model can't mistake "ran the tool" for "the command worked".
Workspace ledger. A live snapshot of files written and recent commands is injected each turn so the model has ground truth, not hallucinated paths.
Cross-turn memory. Multi-turn conversations preserve history and the workspace state across user prompts.
Per-model prompt packs. Built-in packs tuned for Qwen, DeepSeek, and Gemma quirks; auto-selected from the model tag.
Friendly first-run errors. No cryptic "OllamaUnreachable" — actionable buttons (Install Ollama, copy ollama serve, copy ollama pull <model>, retry).

How it works

TinyFalcon runs a recursive agentic loop:

Ingest — Workspace context + your prompt.
Plan — The local model proposes a tool call (read file, write file, run command).
Act — The extension executes via the VS Code API.
Observe — The agent reads the result: file contents, exit codes, stderr.
Refine — Loop, or finish with a reply summarizing what it did.

What's different from "ReAct-style" loops written for cloud models: every step is gated by a small FSM that enforces invariants the model can't be trusted to enforce itself. The model picks the actions; the FSM enforces correctness.

Supported Models

Any model Ollama can serve will work. Recommended starting points:

Model tag	Pack	Notes
`qwen2.5-coder:7b`	qwen	Best balance of size/quality for most hardware. Default starting point.
`qwen2.5-coder:14b` / `:32b`	qwen	Higher quality on stronger GPUs.
`deepseek-coder:6.7b`	deepseek	Strong code-focused alternative.
`gemma2:9b`	gemma	Good general-purpose backup.

Switch models in Settings → TinyFalcon → Ollama: Model. The matching prompt pack is selected automatically.

Not yet supported (post-v1): Reasoning models like deepseek-r1 — their <think> block prefixes break the JSON parser. Tracked separately.

Settings

Setting	Default	What it does
`tinyfalcon.ollama.host`	`http://localhost:11434`	Where to reach Ollama.
`tinyfalcon.ollama.model`	`qwen2.5-coder:7b`	Which Ollama model to use.
`tinyfalcon.agent.promptPack`	`auto`	Which prompt pack: `auto`, `default`, `qwen`, `deepseek`, `gemma`.
`tinyfalcon.agent.maxIterations`	`25`	Max plan-act-observe cycles per prompt before giving up.
`tinyfalcon.tools.enabled`	`true`	Enable agentic tool use. Disable to fall back to plain chat.
`tinyfalcon.tools.autoApprove.readFile`	`true`	Auto-approve `read_file` (non-destructive).
`tinyfalcon.tools.autoApprove.writeFile`	`false`	Auto-approve `write_file`. Off by default for safety.
`tinyfalcon.tools.autoApprove.runTerminal`	`false`	Auto-approve `run_terminal`. Off by default for safety.

Privacy

TinyFalcon's default mode is fully local — your code, prompts, and the model's outputs stay on your machine. No telemetry. No accounts. No usage metering.

This isn't an absolute claim — cloud-fallback support may be added later as an explicit opt-in. The current release (v0.1.0) has no such mode and makes no network calls beyond localhost:11434.

If something doesn't work, please use the Q&A tab on the marketplace listing.

License

Proprietary. Use granted under the terms in the LICENSE file (linked from the marketplace Resources sidebar); redistribution, reverse engineering, and derivative works are not permitted.

Your data, your compute. TinyFalcon is small by design, not by ambition.

TinyFalcon

Falcon Rising

TinyFalcon

⚠️ Requires Ollama (free)

Install

Screenshots

What makes TinyFalcon different

How it works

Supported Models

Settings

Privacy

License