Skip to content

FNNDSC/IO_calc

Repository files navigation

IO_calc: Federated Learning I/O and Runtime Estimation Tool

This tool estimates training times and network transfer overhead for federated learning across distributed healthcare environments, with a focus on medical AI models suitable for hospital infrastructure. The analysis supports synchronous and asynchronous federated learning regime comparisons under varying network and hardware constraints.

Purpose

This repository models how local compute times (epoch durations) and network synchronization times scale with model size, bandwidth, and hardware class. It highlights practical thresholds where synchronous federated learning remains feasible and when asynchronous approaches become necessary.

Key use case: Inform infrastructure and architecture decisions for OiX-TD federated deployments.

Assumptions

  • Epoch training times are estimated from published benchmarks and scaled where needed.

  • Sync times assume full weight transfer per epoch (2× model size, upload + download).

  • Latency floors applied: minimum epoch = 1 second; minimum sync = 50 ms.

  • Network bandwidths evaluated: 10 Mbps, 100 Mbps, 1 Gbps, 10 Gbps.

  • Hardware classes include: consumer CPUs, datacenter CPUs, single GPUs, multi-GPU nodes.

Quick Usage (Default Python Environment)

pip install -r requirements.txt
python federated_analysis.py --table full --format console

Generate CSV Output:

python federated_analysis.py --table concise --format csv

Generate Plots (Scatter and Heatmap):

python federated_analysis.py --table full --plot --plot-type both --save-plots
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python federated_analysis.py --table full --format console

For systems using uv:

uv pip install -r requirements.txt
python federated_analysis.py --table full --format console

Output Files

  • federated_learning_analysis_full.csv – Full dataset

  • federated_learning_analysis_concise.csv – Bandwidth-focused summary

  • federated_learning_scatter.png – Scatter plot (Model Size vs Time)

  • federated_learning_heatmap.png – Heatmap (Training Time by Bandwidth)

Key Findings (Summary)

  • Medical AI models (8M–860M parameters) are practical for hospital federated learning.

  • Network sync overhead remains manageable (<5 minutes at 10 Mbps for most models).

  • Larger generative AI models (>8B parameters) become infeasible on typical hospital networks.

  • Infrastructure assumptions align with realistic hospital CPU/GPU deployments.

For full numeric results, see the provided CSV files.

About

Federated learning IO calculation scripts and data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages