Skip to content
| Marketplace
Sign in
Visual Studio Code>Data Science>RunForge — ML Training, ReproducibleNew to Visual Studio Code? Get it now.
RunForge — ML Training, Reproducible

RunForge — ML Training, Reproducible

mcp-tool-shop

|
295 installs
| (0) | Free
Deterministic, contract-driven ML training inside VS Code. Same dataset, same seed, same version — identical model, every time. Run, browse, inspect — all in-editor.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

日本語 | 中文 | Español | Français | हिन्दी | Italiano | Português (BR)

RunForge Logo

CI status on main Coverage on main VS Code Marketplace version VS Code Marketplace installs VS Code Marketplace rating MIT License Landing page

Notice for v1.0.1 Marketplace users (2026-03-25 release): v1.0.1 shipped with five production-CRITICAL bugs that break the Train commands, run browsing, and observability views (root cause: a subprocess invocation regression and a path/shape mismatch between writer and readers). All five are fixed in v1.1.0, which also delivers the Phase 4 feature surface (cancel-in-progress, recovery, workspace trust). If you installed v1.0.1, please upgrade to v1.1.0. See the v1.0.1 release note and CHANGELOG.md for details.

Deterministic, contract-driven ML training inside VS Code. Same dataset, same seed, same version — identical model, every time.

Phase 3 (Capabilities & Interpretability) frozen at v0.3.6.0. Phase 4 (Lifecycle & Recovery) shipped in v1.1.0 — see the Phase 4 contract.

What's New in v1.1.0

  1. Cancel in-progress training (RunForge: Cancel Active Training) — cancel a running train via the Command Palette or VS Code's progress-notification cancel button. 5s graceful SIGTERM window, then SIGKILL. Cancelled runs land a .cancelled marker so recovery and the run picker can classify them correctly.
  2. Recover Index (RunForge: Recover Index) — walks .ml/runs/ and re-appends any run missing from .ml/outputs/index.json. Idempotent. Useful after a crashed write or a workspace move.
  3. Workspace trust guard — Python subprocess spawn now requires vscode.workspace.isTrusted. Untrusted workspaces get an actionable SafeError pointing at the Manage Workspace Trust UI.
  4. Per-epoch progress notifications — training shows live progress and exposes a cancel button via vscode.window.withProgress.
  5. Hardened CSV error messages — non-comma delimiters, non-UTF-8 encodings, all-NaN labels, single-column CSVs, and header-only CSVs each surface specific, actionable diagnostics instead of opaque pandas tracebacks.
  6. Custom ESLint rules enforce the architectural doctrines codified in docs/CONTRACTS.md (no canonical-value literal duplication, no shadow types in consumer modules).
  7. Doctrine documentation — docs/CONTRACTS.md now codifies the six architectural rules + seven operational patterns from five waves of structured audit. The patterns are non-negotiable for any cross-domain (TS / Python / observability) work.

Plus a v1.1.0 closes all five v1.0.1 CRITICAL regressions (F-COORD-003, F-COORD-004, F-COORD-008, F-COORD-010, F-COORD-011). See CHANGELOG.md for the full breakdown.


🛡️ The RunForge Guarantee

RunForge is opinionated software designed to replace "it works on my machine" with forensic certainty.

What We Guarantee

  1. Determinism: Every run is seeded. Re-running the same preset with the same seed on the same data yields the exact same model.
  2. Provenance: Every run.json record includes the Git Commit SHA, Python Interpreter path, and Extension Version used. You can trace any model back to the code that built it.
  3. Auditability: Artifacts (models, metrics, logs) are saved to disk in standard formats (JSON, joblib). No hidden databases, no cloud dependencies.

What This Is Not

  • Not a Magic AutoML Tool: We don't guess what you want. We run specific, tunable presets.
  • Not a Cloud Platform: We don't ship your data anywhere. Everything happens locally in your VS Code workspace.

For the full trust model, see docs/TRUST_MODEL.md.

Security & Data Scope

Data touched: workspace CSV files (read-only for training), .ml/ directory (run metadata, model artifacts, metrics JSON), Python subprocess stdout/stderr. Data NOT touched: no files outside the open workspace, no browser data, no OS credentials. Permissions required: filesystem read/write within workspace only, Python subprocess execution. No network egress — all operations are local. No telemetry is collected or sent.

Lifecycle of a Run

dataset.csv
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│  Training (run_training)                                    │
│                                                             │
│  1. Validate dataset (label column, numeric values)         │
│  2. Compute dataset fingerprint (SHA-256)                   │
│  3. Split 80/20 train/val (deterministic, stratified)       │
│  4. Fit pipeline (StandardScaler + Classifier)              │
│  5. Compute metrics                                         │
│  6. Extract interpretability (if supported)                 │
└─────────────────────────────────────────────────────────────┘
    │
    ▼
.ml/runs/<run-id>/
    ├── run.json                              ← Metadata + pointers
    ├── metrics.json                          ← Phase 2 metrics (3 keys)
    ├── metrics.v1.json                       ← Detailed metrics by profile
    └── artifacts/
        ├── model.pkl                         ← Trained pipeline
        ├── feature_importance.v1.json        ← (RandomForest only)
        ├── linear_coefficients.v1.json       ← (Linear models only)
        └── interpretability.index.v1.json    ← Unified index

Installation

npm install
npm run compile

Commands

Command Description
RunForge: Train (Standard) Run training with std-train preset
RunForge: Train (High Quality) Run training with hq-train preset
RunForge: Open Runs View completed training runs
RunForge: Inspect Dataset Validate dataset before training (v0.2.2.1+)
RunForge: Open Latest Run Metadata View metadata for most recent run (v0.2.2.1+)
RunForge: Inspect Model Artifact View pipeline structure of model.pkl (v0.2.2.2+)
RunForge: Browse Runs Browse all runs with actions (summary, diagnostics, artifact) (v0.2.3+)
RunForge: View Latest Metrics View detailed metrics from metrics.v1.json (v0.3.3+)
RunForge: View Latest Feature Importance View feature importance for RandomForest models (v0.3.4+)
RunForge: View Latest Linear Coefficients View coefficients for linear models (v0.3.5+)
RunForge: View Latest Interpretability Index View unified index of all interpretability artifacts (v0.3.6+)
RunForge: Export Latest Run as Markdown Save a formatted markdown summary of the latest run (v0.4.3+)

Usage

  1. Set RUNFORGE_DATASET environment variable to your CSV path
  2. CSV must have a column named label
  3. Run training via Command Palette

Guarantees (v0.2.1+)

RunForge VS Code provides deterministic, contract-driven ML training. The guarantees below are intentional and enforced by tests.

Determinism

Given the same dataset, configuration, and RunForge version:

  • Train/validation splits are identical across runs
  • Generated artifacts are reproducible
  • Metrics outputs are stable

There is no randomness outside explicitly seeded behavior.

Label Handling

  • The label column is explicitly specified
  • The label is never inferred by column position
  • Misconfigured or missing labels fail early

Metrics Contract

Training outputs exactly three metrics:

{
  "accuracy": number,
  "num_samples": number,
  "num_features": number
}

No additional fields are added implicitly. Schema expansion requires a versioned contract change.

Model Artifacts

  • model.pkl is always a serialized sklearn.Pipeline
  • All preprocessing (e.g. scaling) is embedded
  • The artifact is self-contained and inference-ready

No external preprocessing steps are required.

Missing Data

  • Rows containing missing values are dropped deterministically
  • The number of dropped rows is logged
  • No silent imputation occurs

Source of Truth

  • All Python execution logic lives in python/ml_runner/
  • There is no duplicated or shadow implementation
  • Tests enforce parity between TypeScript and Python behavior

Stability Policy

  • Behavior at v0.2.1 is frozen
  • Breaking changes require an explicit major version bump
  • Silent behavior changes are considered bugs

Non-Goals (Intentional)

RunForge does not currently attempt to:

  • Auto-select models (user must choose explicitly)
  • Tune hyperparameters (defaults are fixed per preset)
  • Perform online or incremental training
  • Hide training behavior behind heuristics

Correctness and transparency take priority over automation.


Observability (v0.2.2.1+)

Phase 2.2.1 adds visibility into training runs without changing training behavior.

Run Metadata

Each training run produces a run.json with:

  • Run ID and timestamp
  • Dataset fingerprint (SHA-256)
  • Label column and feature count
  • Dropped rows count
  • Metrics snapshot
  • Artifact paths

Dataset Inspection

Inspect datasets before training:

python -m ml_runner inspect --dataset data.csv --label label

Returns column names, row count, feature count, and label validation.

Provenance Tracking

All runs are indexed in .ml/outputs/index.json for traceability:

  • Given a model.pkl, trace back to run metadata
  • Find all runs for a given dataset fingerprint
  • Append-only index (never reorders or deletes)

Artifact Introspection (v0.2.2.2+)

Phase 2.2.2 adds read-only inspection of trained artifacts.

Inspection is read-only and does not retrain or modify artifacts.

Pipeline Inspection

Inspect what's inside a model.pkl without retraining:

python -m ml_runner inspect-artifact --artifact model.pkl

Returns structured JSON with:

  • Pipeline steps (in order)
  • Step types and modules
  • Preprocessing detection

Example output:

{
  "schema_version": "0.2.2.2",
  "artifact_path": "model.pkl",
  "pipeline_steps": [
    {"name": "scaler", "type": "StandardScaler", "module": "sklearn.preprocessing._data"},
    {"name": "clf", "type": "LogisticRegression", "module": "sklearn.linear_model._logistic"}
  ],
  "has_preprocessing": true,
  "step_count": 2
}

Diagnostics

Structured diagnostics explain why a run behaved the way it did:

Code Description
MISSING_VALUES_DROPPED Rows dropped due to missing values
LABEL_NOT_FOUND Label column not present in dataset
LABEL_TYPE_INVALID Label column has invalid type
ZERO_ROWS Dataset has zero rows after processing
ZERO_FEATURES Dataset has no feature columns
LABEL_ONLY_DATASET Dataset contains only the label column

All diagnostics are machine-readable JSON (no log parsing needed).


Browse Runs (v0.2.3+)

Phase 2.3 adds a unified run browser with quick actions.

Using Browse Runs

  1. Open Command Palette (Ctrl+Shift+P)
  2. Run RunForge: Browse Runs
  3. Select a run from the list (newest first)
  4. Choose an action:
    • Open Run Summary — View run metadata as readable markdown
    • View Diagnostics — See what happened during the run
    • Inspect Model Artifact — View pipeline structure
    • Copy Dataset Fingerprint — Copy SHA-256 to clipboard

Synthesized Diagnostics

Diagnostics are derived from run.json fields:

Condition Diagnostic
dropped_rows_missing_values > 0 MISSING_VALUES_DROPPED

Full structured diagnostics emission is planned for future phases.


Model Selection (v0.3.1+)

Phase 3.1 adds explicit model selection while preserving all Phase 2 guarantees.

Supported Models

Model CLI Value Description
Logistic Regression logistic_regression Default, fast, interpretable
Random Forest random_forest Ensemble, handles non-linear patterns
Linear SVC linear_svc Support vector classifier, margin-based

Configuration

Set the model family in VS Code settings:

{
  "runforge.modelFamily": "random_forest"
}

Or use the Settings UI: Search for "RunForge Model Family" and select from the dropdown.

CLI Usage

python -m ml_runner train --preset std-train --out ./run --device cpu --model random_forest

The --model argument is optional. Default: logistic_regression.

Provenance

The selected model family is recorded in run.json:

{
  "model_family": "random_forest",
  "runforge_version": "0.3.1.0"
}

Backward Compatibility

  • All Phase 2 runs remain readable
  • Default behavior unchanged (logistic regression)
  • No migration required
  • Preprocessing remains fixed (StandardScaler for all models)

Hyperparameters & Training Profiles (v0.3.2+)

Phase 3.2 adds explicit hyperparameter control and training profiles.

Training Profiles

Named profiles provide pre-configured hyperparameters:

Profile Description Model Family
default No hyperparameter overrides (uses setting)
fast Reduced iterations for quick runs logistic_regression
thorough More trees/iterations for better quality random_forest

Configure in VS Code settings:

{
  "runforge.profile": "fast"
}

CLI Hyperparameters

Override individual hyperparameters via CLI:

python -m ml_runner train --preset std-train --out ./run --device cpu --param C=0.5 --param max_iter=200

Precedence Rules

When both profile and CLI params are set:

  1. CLI --param (highest priority)
  2. Profile-expanded parameters
  3. Model defaults (lowest priority)

Provenance

Hyperparameters and profiles are recorded in run.json:

{
  "model_family": "random_forest",
  "profile_name": "thorough",
  "profile_version": "1.0",
  "expanded_parameters_hash": "abc123...",
  "hyperparameters": [
    {"name": "n_estimators", "value": 200, "source": "profile"},
    {"name": "max_depth", "value": 5, "source": "cli"}
  ]
}

When no profile is used, profile fields are omitted entirely (not null).


Model-Aware Metrics (v0.3.3+)

Phase 3.3 adds detailed, model-aware metrics with capability-based profile selection.

Metrics Profiles

Metrics profiles are automatically selected based on model capabilities:

Profile Description Metrics
classification.base.v1 All classifiers accuracy, precision, recall, f1, confusion matrix
classification.proba.v1 Binary + predict_proba base + ROC-AUC, log loss
classification.multiclass.v1 3+ classes base + per-class precision/recall/f1

Profile Selection Logic

  • Binary classification + predict_proba → classification.proba.v1
  • Multiclass (3+ classes) → classification.multiclass.v1
  • Otherwise → classification.base.v1

Model Capabilities

Model predict_proba decision_function
LogisticRegression ✅ ✅
RandomForest ✅ ❌
LinearSVC ❌ ✅ (ROC-AUC only)

Metrics Artifact

Training now produces metrics.v1.json alongside metrics.json:

{
  "schema_version": "metrics.v1",
  "metrics_profile": "classification.proba.v1",
  "num_classes": 2,
  "accuracy": 0.95,
  "precision_macro": 0.94,
  "recall_macro": 0.93,
  "f1_macro": 0.94,
  "confusion_matrix": [[45, 5], [3, 47]],
  "roc_auc": 0.97,
  "log_loss": 0.15
}

Run Metadata

run.json now includes metrics_v1 pointer:

{
  "schema_version": "run.v0.3.6",
  "metrics_v1": {
    "schema_version": "metrics.v1",
    "metrics_profile": "classification.proba.v1",
    "artifact_path": "metrics.v1.json"
  },
  "artifacts": {
    "model_pkl": "artifacts/model.pkl",
    "metrics_v1_json": "metrics.v1.json"
  }
}

Backward Compatibility

  • metrics.json (Phase 2) remains unchanged
  • All existing tools continue to work
  • Profile fields in run.json appear together or not at all

Feature Importance (v0.3.4+)

Phase 3.4 adds read-only feature importance extraction for supported models.

Supported Models

Feature importance is only available for models with native importance signals:

Model Supported Importance Type
RandomForest ✅ Gini importance
LogisticRegression ❌ Not in v1
LinearSVC ❌ Not in v1

No approximations: If the model doesn't support native importance, no artifact is emitted.

Feature Importance Artifact

RandomForest runs produce artifacts/feature_importance.v1.json:

{
  "schema_version": "feature_importance.v1",
  "model_family": "random_forest",
  "importance_type": "gini_importance",
  "num_features": 10,
  "features_by_importance": [
    {"name": "feature_a", "importance": 0.35, "rank": 1},
    {"name": "feature_b", "importance": 0.25, "rank": 2}
  ],
  "features_by_original_order": [
    {"name": "feature_a", "importance": 0.35, "index": 0},
    {"name": "feature_b", "importance": 0.25, "index": 1}
  ],
  "top_k": ["feature_a", "feature_b"]
}

Run Metadata

run.json includes feature importance reference when available:

{
  "feature_importance_schema_version": "feature_importance.v1",
  "feature_importance_artifact": "artifacts/feature_importance.v1.json",
  "artifacts": {
    "model_pkl": "artifacts/model.pkl",
    "feature_importance_json": "artifacts/feature_importance.v1.json"
  }
}

When feature importance is not available, these fields are omitted entirely (not null).

Diagnostics

Unsupported models emit structured diagnostics:

Code Description
FEATURE_IMPORTANCE_UNSUPPORTED_MODEL Model doesn't support native feature importance
FEATURE_NAMES_UNAVAILABLE Feature names could not be resolved

Not Supported in v1

The following are explicitly out of scope for v1:

  • Coefficient-based importance for linear models
  • SHAP/LIME explanations
  • Permutation importance
  • Partial dependence plots

Supported Hyperparameters

Logistic Regression:

  • C (float, > 0): Regularization strength
  • max_iter (int, > 0): Maximum iterations
  • solver (str): Optimization solver
  • warm_start (bool): Reuse previous solution

Random Forest:

  • n_estimators (int, > 0): Number of trees
  • max_depth (int or None): Maximum tree depth
  • min_samples_split (int, >= 2): Min samples to split
  • min_samples_leaf (int, > 0): Min samples per leaf

Linear SVC:

  • C (float, > 0): Regularization strength
  • max_iter (int, > 0): Maximum iterations

Linear Coefficients (v0.3.5+)

Phase 3.5 adds read-only coefficient extraction for linear classifiers.

Supported Models

Linear coefficients are available for models with native coef_ attribute:

Model Supported Coefficient Type
LogisticRegression ✅ Log-odds coefficients
LinearSVC ✅ SVM coefficients
RandomForest ❌ Use Feature Importance instead

No approximations: If the model doesn't support native coefficients, no artifact is emitted.

Coefficient Space (IMPORTANT)

All coefficients are in STANDARDIZED feature space.

This means:

  • Coefficients correspond to features AFTER StandardScaler
  • Values represent influence per 1 standard deviation increase
  • No attempt is made to "invert" scaling back to raw feature units
  • Comparing coefficients across features is meaningful (same scale)
  • Comparing coefficients to raw feature values is NOT meaningful

Linear Coefficients Artifact

Linear model runs produce artifacts/linear_coefficients.v1.json:

{
  "schema_version": "linear_coefficients.v1",
  "model_family": "logistic_regression",
  "coefficient_space": "standardized",
  "num_features": 10,
  "num_classes": 2,
  "classes": [0, 1],
  "intercepts": [
    {"class": 1, "intercept": 0.5}
  ],
  "coefficients_by_class": [
    {
      "class": 1,
      "features": [
        {"name": "feature_a", "coefficient": 2.35, "abs_coefficient": 2.35, "rank": 1},
        {"name": "feature_b", "coefficient": -1.25, "abs_coefficient": 1.25, "rank": 2}
      ]
    }
  ],
  "top_k_by_class": [
    {"class": 1, "top_features": ["feature_a", "feature_b"]}
  ]
}

Multiclass Support

For multiclass classification (3+ classes), coefficients are grouped per class:

  • Each class has its own set of coefficients
  • Class labels are sorted deterministically
  • No aggregation across classes in v1

Run Metadata

run.json includes linear coefficients reference when available:

{
  "linear_coefficients_schema_version": "linear_coefficients.v1",
  "linear_coefficients_artifact": "artifacts/linear_coefficients.v1.json",
  "artifacts": {
    "model_pkl": "artifacts/model.pkl",
    "linear_coefficients_json": "artifacts/linear_coefficients.v1.json"
  }
}

When coefficients are not available, these fields are omitted entirely (not null).

Diagnostics

Unsupported models emit structured diagnostics:

Code Description
LINEAR_COEFFICIENTS_UNSUPPORTED_MODEL Model doesn't support coefficient extraction
COEFFICIENTS_MISSING_ON_ARTIFACT Classifier doesn't have coef_ attribute
FEATURE_NAMES_UNAVAILABLE Feature names could not be resolved

Feature Importance vs Linear Coefficients

Artifact Supported Models What It Shows
Feature Importance (v0.3.4) RandomForest Gini importance (tree-based)
Linear Coefficients (v0.3.5) LogisticRegression, LinearSVC Model coefficients

These are complementary:

  • Use Feature Importance for ensemble models
  • Use Linear Coefficients for interpretable linear models

Interpretation Guide

For LogisticRegression (binary):

  • Positive coefficient: Feature increase → Higher probability of positive class
  • Negative coefficient: Feature increase → Lower probability of positive class
  • Magnitude: Larger absolute value = Stronger influence

Example: coefficient = 2.0 means +1 std dev in this feature → +2.0 to log-odds


Interpretability Index (v0.3.6+)

Phase 3.6 adds a unified index artifact that links all interpretability outputs for a run.

Purpose

The interpretability index answers: "What interpretability outputs exist for this run, what versions are they, and where are they?"

No new computation - just linking and summarizing existing artifacts.

Index Artifact

Each run produces artifacts/interpretability.index.v1.json:

{
  "schema_version": "interpretability.index.v1",
  "run_id": "20240101-120000-abc12345",
  "runforge_version": "0.3.6.0",
  "created_at": "2024-01-01T12:00:00+00:00",
  "available_artifacts": {
    "metrics_v1": {
      "schema_version": "metrics.v1",
      "path": "metrics.v1.json",
      "summary": {
        "metrics_profile": "classification.proba.v1",
        "accuracy": 0.95
      }
    },
    "feature_importance_v1": {
      "schema_version": "feature_importance.v1",
      "path": "artifacts/feature_importance.v1.json",
      "summary": {
        "model_family": "random_forest",
        "top_k": ["feature_a", "feature_b", "feature_c"]
      }
    },
    "linear_coefficients_v1": {
      "schema_version": "linear_coefficients.v1",
      "path": "artifacts/linear_coefficients.v1.json",
      "summary": {
        "model_family": "logistic_regression",
        "num_classes": 2,
        "top_k_by_class": [{"class": 1, "top_features": ["feat_a", "feat_b"]}]
      }
    }
  }
}

Availability Rules

  • Absent artifacts are omitted from available_artifacts (not set to null or false)
  • The index only claims availability if the file actually exists
  • A minimal run (LogisticRegression) will have metrics_v1 and linear_coefficients_v1
  • A RandomForest run will have metrics_v1 and feature_importance_v1

Summary Content

Summaries include only reference data (no numeric values duplicated):

Artifact Summary Contains
metrics_v1 metrics_profile, accuracy (from run.json)
feature_importance_v1 model_family, top_k (names only, max 5)
linear_coefficients_v1 model_family, num_classes, top_k_by_class (names only)

VS Code Command

Use RunForge: View Latest Interpretability Index to see a formatted summary with quick links to open individual artifacts.


Getting Started

For a guided walkthrough, see docs/WALKTHROUGH.md.


Contract & Documentation

Core Documents

Document Purpose
docs/TRUST_MODEL.md How RunForge establishes trust
docs/WALKTHROUGH.md Guided 2-3 minute tour
CONTRACT.md Full behavioral contract
CONTRACT-PHASE-3.md Phase 3 expansion rules

Phase 2 (Frozen)

Document Scope
docs/PHASE-2.2.1-ACCEPTANCE.md Observability
docs/PHASE-2.2.2-ACCEPTANCE.md Introspection
docs/PHASE-2.3-ACCEPTANCE.md UX polish

Phase 3 (Frozen as of v0.3.6.0)

Document Scope
docs/PHASE-3.1-ACCEPTANCE.md Model selection
docs/PHASE-3.2-ACCEPTANCE.md Hyperparameters & profiles
docs/PHASE-3.3-ACCEPTANCE.md Model-aware metrics
docs/PHASE-3.4-ACCEPTANCE.md Feature importance
docs/PHASE-3.5-ACCEPTANCE.md Linear coefficients
docs/PHASE-3.6-ACCEPTANCE.md Interpretability index

Future

See docs/DEFERRED_UX_ENHANCEMENTS.md for planned improvements.


Phase Status

Phase Focus Status
Phase 2 Core training, observability Frozen
Phase 3 Model selection, interpretability Frozen (v0.3.6.0)
Phase 4 Lifecycle, recovery, doctrine Released (v1.1.0) — see CONTRACT-PHASE-4.md

All Phase 2, Phase 3, and Phase 4 contract surfaces are locked. Future work requires a Phase 5 contract.


License

MIT


Built by MCP Tool Shop

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft