Kubernetes Pod Doctor
Explains why your pods are broken — right in VS Code — with plain-English
causes and concrete fixes. Covers the failures developers actually hit:
- CrashLoopBackOff — surfaces the previous container's logs + exit-code hints
- ImagePullBackOff / ErrImagePull — auth vs. missing-tag vs. unreachable-registry
- OOMKilled — memory limit context and fix
- Pending / Unschedulable — parses the scheduler's reason (insufficient CPU, taints, selectors, volumes)
- CreateContainerConfigError — names the missing Secret/ConfigMap
It reads pod status + events + logs, not just logs — because most of these
failures never produce an application log line; the signal is in the pod status
and events.
Zero cost, zero cloud
Fully rule-based. No LLM, no API key, no network calls beyond your own
cluster. Nothing leaves your machine. It uses your existing kubeconfig/context
(the same one kubectl uses).
Architecture
Watcher → Collector → Analyzer (rules) → UI
src/diagnose/ — pure, dependency-free engine (no vscode, no cluster client).
Unit-tested against fixtures (npm test).
src/k8s/ — kubeconfig + pod/event/log access, and the V1Pod → PodContext collector.
src/ui/ — TreeView of unhealthy pods + a single-page failure report
webview that lists every unhealthy pod, its root cause, and fixes on one page
(click the status bar item, the report icon in the view title, or run
Pod Doctor: Open Failure Report). Clicking a single pod still opens its own
focused diagnosis in the same panel.
When no rule matches a failure, the engine returns an "unknown" diagnosis — that
is the seam where an optional, bring-your-own-key LLM explainer can plug in
later for the long tail, paid by the user only when they choose to use it.
Develop
npm install
npm test # run the rule engine against fixtures (no cluster needed)
npm run compile # typecheck + build to out/
# then press F5 in VS Code to launch the Extension Development Host
Try it against a real cluster (kind)
Requires Docker + kind + kubectl.
npm run cluster:up # create a kind cluster + deploy 5 deliberately-broken pods
npm run cluster:diagnose # run the real cluster→collect→engine path from the CLI
npm run cluster:down # tear the cluster down
The broken pods cover the full rule set: the original five
(test/cluster/broken-pods.yaml) plus init-container failure, bad command
(RunContainerError), readiness/liveness probe failures, a securityContext
violation, a volume-mount failure, and DeadlineExceeded
(test/cluster/broken-pods-extra.yaml). With the cluster up, you can also
press F5 and use the actual Pod Doctor sidebar UI against kind-pod-doctor.