IO_calc: Federated Learning I/O and Runtime Estimation Tool

This tool estimates training times and network transfer overhead for federated learning across distributed healthcare environments, with a focus on medical AI models suitable for hospital infrastructure. The analysis supports synchronous and asynchronous federated learning regime comparisons under varying network and hardware constraints.

Purpose

This repository models how local compute times (epoch durations) and network synchronization times scale with model size, bandwidth, and hardware class. It highlights practical thresholds where synchronous federated learning remains feasible and when asynchronous approaches become necessary.

Key use case: Inform infrastructure and architecture decisions for OiX-TD federated deployments.

Assumptions

Epoch training times are estimated from published benchmarks and scaled where needed.
Sync times assume full weight transfer per epoch (2× model size, upload + download).
Latency floors applied: minimum epoch = 1 second; minimum sync = 50 ms.
Network bandwidths evaluated: 10 Mbps, 100 Mbps, 1 Gbps, 10 Gbps.
Hardware classes include: consumer CPUs, datacenter CPUs, single GPUs, multi-GPU nodes.

Quick Usage (Default Python Environment)

pip install -r requirements.txt
python federated_analysis.py --table full --format console

Generate CSV Output:

python federated_analysis.py --table concise --format csv

Generate Plots (Scatter and Heatmap):

python federated_analysis.py --table full --plot --plot-type both --save-plots

Usage with Virtual Environments (Recommended)

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python federated_analysis.py --table full --format console

For systems using uv:

uv pip install -r requirements.txt
python federated_analysis.py --table full --format console

Output Files

federated_learning_analysis_full.csv – Full dataset
federated_learning_analysis_concise.csv – Bandwidth-focused summary
federated_learning_scatter.png – Scatter plot (Model Size vs Time)
federated_learning_heatmap.png – Heatmap (Training Time by Bandwidth)

Key Findings (Summary)

Medical AI models (8M–860M parameters) are practical for hospital federated learning.
Network sync overhead remains manageable (<5 minutes at 10 Mbps for most models).
Larger generative AI models (>8B parameters) become infeasible on typical hospital networks.
Infrastructure assumptions align with realistic hospital CPU/GPU deployments.

For full numeric results, see the provided CSV files.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
CITATION.cff		CITATION.cff
HOWTORUN.adoc		HOWTORUN.adoc
LICENSE		LICENSE
README.adoc		README.adoc
federated_analysis.py		federated_analysis.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IO_calc: Federated Learning I/O and Runtime Estimation Tool

Purpose

Assumptions

Quick Usage (Default Python Environment)

Generate CSV Output:

Generate Plots (Scatter and Heatmap):

Usage with Virtual Environments (Recommended)

Output Files

Key Findings (Summary)

About

Uh oh!

Releases

Packages

Languages

License

FNNDSC/IO_calc

Folders and files

Latest commit

History

Repository files navigation

IO_calc: Federated Learning I/O and Runtime Estimation Tool

Purpose

Assumptions

Quick Usage (Default Python Environment)

Generate CSV Output:

Generate Plots (Scatter and Heatmap):

Usage with Virtual Environments (Recommended)

Output Files

Key Findings (Summary)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages