Adsum IoT Coder – for nRFAn open-source IoT coding agent that cracks the complex firmware bugs general agents struggle with — in less context, fewer tokens, and less time. Backed by live device-log capture, curated SDK knowledge, and an open benchmark on real nRF hardware. Shipping today: nRF52 / nRF53 / nRF54 · BLE · nRF Connect SDK (Zephyr). On the roadmap: Wi-Fi · LTE-M · Matter · ESP-IDF · additional RTOS support. Open source under Apache 2.0. Install → · Benchmark → · Architecture → · Roadmap →
Contents
Why Adsum IoT Coder existsAdsum IoT Coder specialises in IoT communication firmware — wireless protocol stacks (Bluetooth Low Energy / BLE today; Wi-Fi, Thread, Matter, LTE-M on the roadmap) and the related power-budget concerns that come with them. This is not a general embedded-systems agent: it isn't trying to help you write a motor controller or a DSP pipeline. It is built for the specific class of bugs that show up in connected devices. That class of bug fails general coding agents for structural reasons — not model capability. The problems live outside the source code:
None of these are visible in the source code; all of them are common in real BLE/IoT projects. Diagnosing them requires capabilities general coding agents don't have. What an IoT-communication debugging agent needsFour capability pillars — split between what ships today and what's on the roadmap:
Architecture — Dynamic Knowledge & Tool-Skill LoadingFrom proof of concept to platformAdsum IoT Coder is an AI coding agent built on the open-source Cline foundation, with IoT-specific knowledge modules and tool-use skills layered on top. Two months ago we shipped the first version — nRF AI Debugger — as a proof of concept to test whether purpose-built AI tooling could meaningfully outperform general coding agents on embedded IoT firmware. The proof of concept got real traction, but v1's architecture loaded its full domain expertise into every session — a static bundle that worked but couldn't grow. Adding nRF9x or nRF7x support meant expanding the bundle. Adding ESP, Thread, or Matter meant the same. The architecture sat on a cliff edge. Modules loaded on demandThis release inverts that. Domain knowledge and tool-use skills are structured as a framework of discrete, composable modules — each scoped to a specific chip family, protocol stack, or debug capability. At session start, the agent assesses what the project is and what the task requires, then fetches the relevant modules on demand.
The module tree on disk:
Analyzing a UART log drop loads The bigger payoff isn't just avoiding context overflow — it's context quality. Even when a full static bundle would technically fit, loading only the relevant modules keeps domain knowledge in the model's effective working set rather than letting it get buried under unrelated material as the session grows. This is the "lost in the middle" failure mode the benchmark caught Claude Code hitting on L1-T2 — same model, full 200k window, lost the original symptom by debug cycle four. Human-curated, not AI-generatedA common trend in AI tooling is letting agents author and refine their own tool-use skills. Our own research and experimentation led us in the opposite direction for high-stakes IoT debugging. Every module in We don't isolate "human-curated vs. AI-generated" as a single benchmark variable — our system pairs curation with dynamic loading, and we report on the combined architecture. What that architecture demonstrates: same model on both sides, 5/6 vs 3/6 tasks resolved at 3.8× lower token cost than the general-agent baseline (full results in the benchmark). As the knowledge base grows, the benchmark is how we prove we're moving in the right direction. Benchmark — IoT-FirmwareDebugBench v0.1
A clean architecture is only useful if it produces measurably better outcomes. Standard SWE benchmarks don't exercise hardware-in-the-loop work, and there is no established public benchmark for AI agents on embedded IoT firmware. We adapted methodology from recent research on expert-skill-augmented LLM evaluation for embedded code generation (arXiv:2603.19583) and built one — published open source as a deliverable equal in importance to the tool itself. IoT-FirmwareDebugBench v0.1 runs on real nRF52840 DK and nRF52832 DK boards with NCS v3.2.1 (Zephyr 4.2.99). Six BLE-focused tasks across three difficulty levels, each with a precisely injected bug, defined reproduction procedure, and known correct fix. The most important methodological choice: both agents run the same model — Claude Haiku 4.5, with reasoning mode disabled and prompt caching enabled identically. This isolates a single variable — domain architecture. If Adsum IoT Coder outperforms, it is not because it has access to a more capable model; it is because the architecture wraps the same model differently. Difficulty levels. L1 — root cause readable directly from logs. L2 — requires inference from BLE behavior or Kconfig dependencies. L3 — requires correlating state across two devices or full session timelines.
Static Code Fix as a failure mode. Claude Code skipped log capture on two tasks and diagnosed from source code alone — what the benchmark report classifies as Static Code Fix (SCF): a methodology failure regardless of whether the resulting patch happens to compile. On L3-T1, the resulting fix was indeterminate — the root cause (bond asymmetry) is only visible through cross-device log correlation. The dynamic skill architecture eliminates this failure mode by design: log capture is a first-class step in the loaded workflow, not an optional step the agent might skip under exploration pressure. Two other patterns worth noting: context degradation predicted failure (Claude Code burned 27M tokens on L1-T2 and lost the original symptom by the later debug cycles; Adsum IoT Coder resolved it at 148.7k peak), and the gap widens with task difficulty (parity at L2, Adsum 1/2 vs 0/2 at L3). Full per-task breakdown in the benchmark report.
The architecture and the benchmark are two halves of the same commitment: domain-specific AI tooling clean enough to extend, and measurable enough to defend. Run it yourself — that's the conversation we want to be in. Getting StartedOpen the VS Code Extensions panel and search for Adsum IoT Coder, then click Install. Or install from the VS Code Marketplace directly. See CHANGELOG.md for release notes. Configure an AI provider, and open your NCS project. The agent starts with two entry-point workflows:
Analyze nRF Device Logs — captures live RTT/UART logs from connected boards, runs code-aware analysis, produces structured reports. Auto-detects boards via J-Link, supports multi-device simultaneous capture, correlates output with your source code and configuration. Generate Logging Code — reads your NCS project, understands the BLE stack, and injects From analysis results, the agent can enter a Debug Loop — iterative Build → Flash → Capture → Analyze → Fix cycle — continuing until the bug is resolved or you stop it. Requirements
Tested ModelsTry Claude Haiku 4.5 first — it's the model we have IoT-specific benchmark evidence for. DeepSeek-V4-Pro is the cost play for long sessions where margin matters more than empirical confidence.
Configuring a provider. Open VS Code Settings → search for "Adsum IoT Coder" → set the API endpoint URL and key. Any OpenAI-compatible endpoint is accepted (OpenRouter, DeepSeek API, Anthropic via a compatible gateway, or a local Ollama / LM Studio server). Recommended setup for Claude Haiku 4.5 — matches the benchmark configuration:
RoadmapThe product line is Adsum IoT Coder, with each release scoped to a specific IoT chip family. "IoT" reflects the focus: communication stacks and the power-budget concerns that come with them — BLE, Wi-Fi, Thread, Matter, LTE-M — rather than generic embedded coding. "Coder" reflects the trajectory: this release ships debugging because that's where general agents fail hardest and the value is most measurable, but the architecture is designed to cover the full IoT communication development lifecycle — design, implementation, verification, and field optimization — as new modules and skills land.
The roadmap is shaped by what the community asks for and contributes. Open an issue, propose a benchmark task, or contribute a knowledge module. LimitationsWe publish what's true today, not what we wish were true. Product
Benchmark
The methodology is open precisely so others can probe these limits, run independent comparisons, and contribute tasks. Citing this workIf you reference the benchmark or this work in research, please cite:
AboutAdsum Networks — 8+ years building IoT solutions on Nordic and other embedded platforms. Our v1 proof of concept, nRF AI Debugger, reached 200+ installs in its first two months — enough signal to rebuild the architecture for what's next. We built Adsum IoT Coder because general coding agents leave IoT firmware developers without reliable AI assistance for the hardest debugging scenarios — protocol failures, power-budget violations, and runtime-only bugs that don't show up in source review. Our belief: domain-specific AI tooling needs to be (a) built by engineers who have lived inside the failure modes, and (b) measured against open benchmarks so the value can be defended, not just claimed. Both halves of that conviction are in this release. ContributingWe welcome new benchmark tasks, knowledge modules, and HITL tool integrations.
Open an issue to discuss before larger changes, or open a PR directly for small fixes. Privacy & SecurityThe extension's runtime runs entirely on your machine. Outbound network requests go only to the AI provider you configured, carrying only the data listed below. Sent to the model:
Never sent:
BYOK (Bring Your Own Key) — you control which model and endpoint you trust. Source is fully open and auditable.
Telemetry. Anonymous extension activations, tool triggers, and execution errors. Never source code, file paths, chat content, or device logs. Opt out: set TroubleshootingShell integration warning on first run — restart VS Code and open a new terminal session. Linux notifications — if J-Link not detected / board not auto-detected — confirm the SEGGER J-Link drivers are installed and the board enumerates in Flash command fails — make sure no other tool (nRF Connect for Desktop, OpenOCD) holds the J-Link interface. Only one process can flash at a time. AI provider authentication errors — verify your API key in the extension settings and that the endpoint URL matches your provider (e.g. Model refuses tool calls / returns plain text — the configured model must support native tool-calling. Models without function-calling support cannot drive hardware workflows. See Tested Models. Still stuck? Open a Discussion — we read every one. Acknowledgments
License |

