|
This component is being built on the |
The physics-grounded analytical simulator powering the Machine Learning Systems ecosystem.
Provides a unified "Single Source of Truth" (SSoT) for modeling systems from sub-watt microcontrollers to exaflop-scale global fleets.
mlsysim implements a "Progressive Lowering" architecture, separating high-level workloads from the physical infrastructure that executes them.
| Layer | Domain | Key Components |
|---|---|---|
| Layer A | Workload Representationmlsysim.models |
FLOPs, parameters, and intensity. e.g., Llama3_70B, ResNet50 |
| Layer B | Hardware Registrymlsysim.hardware |
Concrete specs for real-world silicon. e.g., H100, TPUv5p, Jetson |
| Layer C | Infrastructuremlsysim.infra |
Grid profiles and datacenter sustainability. e.g., PUE, Carbon Intensity, WUE |
| Layer D | Systems & Topologymlsysim.systems |
Fleet configurations and network fabrics. e.g., Doorbell, AutoDrive Scenarios |
| Layer E | Execution & Resolversmlsysim.core.solver |
The 3-tier math engine: Models, Solvers, and Optimizers (Design space search). |
mlsysim is a first-principles analytical calculator for ML systems. It provides a terminal UI for humans and a strict JSON API for CI/CD pipelines and AI agents.
Accuracy note: mlsysim predictions are typically within 2–5× of measured performance for well-characterized workloads. For production capacity planning, always validate with benchmarks. This tool formalizes the back-of-envelope math that senior engineers do intuitively — it is not a substitute for profiling or load testing.
Discover built-in hardware, models, and infrastructure without reading source code:
mlsysim zoo hardware
mlsysim zoo models
Evaluate the physics of a workload on a specific hardware node instantly: mlsysim eval Llama3_8B H100 --batch-size 32
Define your entire cluster and SLA constraints in a declarative mlsys.yaml file:
# example_cluster.yaml
version: "1.0"
workload:
name: "Llama3_70B"
batch_size: 4096
hardware:
name: "H100"
nodes: 64
ops:
region: "Quebec"
duration_days: 14.0
constraints:
assert:
- metric: "performance.latency"
max: 50.0Then compile and evaluate the 3-lens scorecard (Feasibility, Performance, Macro): mlsysim eval example_cluster.yaml
Every command supports strict, schema-validated JSON output. If an assert constraint is violated, the CLI returns a semantic Exit Code 3.
# Export the JSON Schema for your IDE or AI Agent
mlsysim schema > schema.json
# Run an evaluation in a CI pipeline
tco=$(mlsysim --output json eval example_cluster.yaml | jq .macro.metrics.tco_usd)Use the Tier 3 Engineering Engine to automatically find the optimal configuration:
mlsysim optimize parallelism example_cluster.yaml
mlsysim optimize placement example_cluster.yaml --carbon-tax 150
Because this core powers a printed textbook, we enforce strict Invariant Verification. Every physical constant is traceable to a primary source (datasheet or paper), and dimensional integrity is enforced via pint.
MLSysim is an analytical hardware calculator, not a production deployment simulator. The 22 walls model physical and economic constraints that bound ML system performance. Several critical production concerns are deliberately out of scope:
| Concern | Why It Matters | Where to Learn More |
|---|---|---|
| Data drift / distribution shift | The #1 cause of production ML failures — model accuracy degrades silently as input distributions change | Sculley et al. (2015), "Hidden Technical Debt in ML Systems" |
| Model versioning & rollback | Production requires running multiple versions, A/B testing, and safe rollback | Huyen (2022), Designing Machine Learning Systems |
| Monitoring & observability | You cannot manage what you cannot measure — prediction distributions, latency percentiles, error rates | Google SRE Book (2016); Huyen (2022) |
| Feature store freshness | Stale features silently degrade real-time models (recommendations, fraud detection) | Uber Michelangelo (2017) |
| Software bugs & misconfigurations | Most outages are caused by software, not hardware | Barroso et al. (2018) |
| Human factors | Team velocity, on-call burden, and organizational alignment often dominate outcomes | Brooks (1975), The Mythical Man-Month |
Passing all 22 walls is necessary but not sufficient for a successful production deployment.
Students using this tool should understand that infrastructure physics (what mlsysim models) is one dimension of a multi-dimensional engineering challenge.
If you use mlsysim in your research or teaching, please cite:
@software{mlsysim2026,
author = {Janapa Reddi, Vijay},
title = {{MLSysim}: A Composable Analytical Framework for Machine Learning Systems},
year = {2026},
url = {https://mlsysbook.ai/mlsysim},
version = {0.1.0},
institution = {Harvard University}
}MLSys·im is designed to be highly modular. Install only what you need:
# Core physics engine only (fastest, smallest footprint)
pip install mlsysim
# Install with the beautiful Terminal UI & YAML support
pip install "mlsysim[cli]"
# Install with dependencies for interactive labs (Marimo, Plotly)
pip install "mlsysim[labs]"The framework is just as powerful inside a Python script or Jupyter Notebook. The SystemEvaluator provides a clean, unified entry point for full-stack analysis:
import mlsysim
# 1. Define the scenario
model = mlsysim.Models.Language.Llama3_8B
hardware = mlsysim.Hardware.Cloud.H100
# 2. Run the evaluation
evaluation = mlsysim.SystemEvaluator.evaluate(
scenario_name="Llama-3 8B on H100",
model_obj=model,
hardware_obj=hardware,
batch_size=32,
precision="fp16",
efficiency=0.45
)
# 3. View the beautifully formatted scorecard
print(evaluation.scorecard())The efficiency parameter (0.0–1.0) captures the gap between peak hardware performance and what your software stack actually achieves. Use these guidelines:
| Scenario | Efficiency | Rationale |
|---|---|---|
| Training (Megatron-LM, large Transformer) | 0.40–0.55 | Well-optimized GEMM + FlashAttention |
| Training (PyTorch eager, small model) | 0.08–0.15 | Kernel launch overhead dominates |
| Inference decode, batch=1 | 0.01–0.05 | Memory-bound; compute nearly idle |
| Inference decode, batch=32+ | 0.15–0.35 | Batch amortizes weight loading |
| Inference prefill, long context | 0.30–0.50 | Compute-bound GEMM + attention |
| TinyML (TFLite Micro on ESP32) | 0.05–0.15 | Interpreter overhead, no tensor cores |