HydraKernel
Run Jupyter notebook branches in parallel from a shared setup.
HydraKernel turns a notebook into a lightweight workflow engine. Instead of manually duplicating notebooks or launching multiple scripts, define common setup cells once and execute independent branches simultaneously using separate Python processes.
Perfect for scenario analysis, parameter sweeps, model runs, sensitivity studies, and any workflow where several independent computations share the same initialization code.
Why HydraKernel?
A common notebook workflow looks like this:
# load data
df = pd.read_csv("large_dataset.csv")
# preprocess
df = preprocess(df)
# build model inputs
inputs = build_inputs(df)
followed by multiple independent experiments:
# baseline
# central
# conservative
Normally, you either:
- Run them sequentially
- Duplicate notebook cells
- Create separate scripts
- Build a custom workflow pipeline
HydraKernel automates this process.
Features
Shared Setup Cells
Mark cells that should be included in every branch:
# hydra: setup
import pandas as pd
df = pd.read_csv("data.csv")
Parallel Branch Execution
Create independent branches:
# hydra: branch baseline
run_model("baseline")
# hydra: branch central
run_model("central")
# hydra: branch conservative
run_model("conservative")
HydraKernel automatically generates temporary scripts and executes them simultaneously.
Live Status Tracking
HydraKernel displays branch execution status:
Branch Status
-------------
🟢 baseline Running
⏳ central Queued
✅ conservative Done
❌ failed_case Failed
Consolidated Output Logging
All branch output is streamed into a dedicated HydraKernel output panel:
[baseline] starting...
[central] starting...
[baseline] complete
[baseline] finished with code 0
[central] complete
[central] finished with code 0
Setup Caching (Experimental)
Mark a setup cell with:
# hydra: setup
# hydra: cache
HydraKernel executes setup once, serializes compatible Python objects, and loads them into every branch.
Useful when setup is expensive:
# hydra: setup
# hydra: cache
df = pd.read_parquet("50GB_dataset.parquet")
instead of reloading the dataset for every branch.
Example
# hydra: setup
x = 5
y = 10
# hydra: branch baseline
print("baseline", x + y)
# hydra: branch central
print("central", x * y)
Output:
[baseline] baseline 15
[central] central 50
Requirements
- Visual Studio Code
- Jupyter Notebook extension
- Python 3.9+
- Optional:
cloudpickle for setup caching
Install:
pip install cloudpickle
Usage
- Open a Jupyter notebook.
- Mark shared cells with:
# hydra: setup
- Mark branch cells with:
# hydra: branch <name>
- Open the Command Palette:
HydraKernel: Run Branches
- Watch branches execute in parallel.
Current Limitations
- Branches execute as separate Python processes.
- Cached objects must be serializable.
- Open file handles, sockets, GPU contexts, and some external resources cannot be cached.
- Notebook cell outputs are currently displayed in the HydraKernel output panel rather than written back into notebook cells.
Roadmap
Planned
- Stop running branches
- Branch progress bars
- Automatic interpreter detection
- Branch groups
- Run selected branches only
- Notebook output integration
- Distributed execution support
Future
- Remote cluster execution
- SLURM integration
- Parameter sweep generation
- Dependency graphs
- Branch result comparison tools
Release Notes
0.0.1
Initial release.
- Shared setup cells
- Parallel branch execution
- Live output streaming
- Status tracking
- Experimental setup caching
HydraKernel: If you ever wished to just stop restarting your notebooks.