Benchmark: Model benchmark - deterministic training support #731

Aishwarya-Tonpe · 2025-08-28T17:41:54Z

Support for deterministic training and reproducible logging to all PyTorch model benchmarks in SuperBench (BERT, GPT2, LLaMA, LSTM, CNN, Mixtral).

Deterministic mode: Makes sure model runs are consistent every time by fixing random seeds, turning off TF32, and using stable math operations.
Log generation: Saves key info like loss and activation stats during training.
Log comparison: Lets you compare a new run with a previous one to check if they match.
New command-line options:

--enable-determinism
--generate-log {boolean flag which when enabled, stores the metrics (loss and activation mean) to the results file}
--compare-log {log path of the json file against which you want to compare the results of the current run}
--check-frequency

Changes -

Updated pytorch_base.py to handle deterministic settings, logging, and comparisons.
Added a new example script: pytorch_deterministic_example.py
Added a test file: test_pytorch_determinism_all.py to verify everything works as expected.

Usage -

Run with --enable-determinism --generate-log to create a reference log.
Run again with --compare-log to check if the new run matches the reference.
Make sure all parameters stay the same between runs.

- Add _enable_deterministic_training() method to set all necessary seeds - Add --deterministic and --random_seed command line arguments - Integrate deterministic training in _create_model() and _generate_dataset() - Add comprehensive unit tests for deterministic functionality - Tests validate parameter parsing, functionality, and regression scenarios - All tests pass and integrate with existing SuperBench test suite

…pass check_frequency to _is_finished in train/infer; add test capturing checksum log; stabilize fp32 loss path and small-dims determinism tests

…oss BERT/GPT2/CNN/LSTM/Mixtral; per-step fp32 loss logging; checksum logs; tests updated to strict/soft determinism pattern; add strict determinism CI guidance

…rings; fix GPT-2 params; soft vs strict checks stabilized

…sum tests with BERT pattern, improve docstrings and skip logic.

…BERT, GPT-2, LSTM, CNN, LLaMA examples

… models; update tests

…/CNN/BERT/Mixtral with periodic fingerprints, per-step loss capture, TF32 off, SDPA math kernel; add model_log_utils; update examples and tests, add env gating for cuBLAS.

…ted example file, remove redundant code

… unnecessary code

…idual model classes

… reduce redundant code

tests/benchmarks/model_benchmarks/test_pytorch_determinism_all.py

…in the test file

Aishwarya-Tonpe · 2025-08-28T19:55:34Z

@Aishwarya-Tonpe please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

…se file

…nd inference steps

Co-authored-by: Copilot <[email protected]>

…etadata overriding

…es not need to be set explicitly before running the benchmarks

Co-authored-by: Copilot <[email protected]>

…b.com/microsoft/superbenchmark into aishwaryatonpe/deterministic-training

…tcases pass on local

Aishwarya-Tonpe added 25 commits August 17, 2025 21:24

llama: add periodic checksum logging (deterministic-only, log-only); …

e103dd0

…pass check_frequency to _is_finished in train/infer; add test capturing checksum log; stabilize fp32 loss path and small-dims determinism tests

deterministic training: enable seeding + deterministic algorithms acr…

87ff6d6

…oss BERT/GPT2/CNN/LSTM/Mixtral; per-step fp32 loss logging; checksum logs; tests updated to strict/soft determinism pattern; add strict determinism CI guidance

tests(pytorch): add strict determinism skip guards and detailed docst…

8eee235

…rings; fix GPT-2 params; soft vs strict checks stabilized

Refactor LLaMA model tests: align strict, soft determinism, and check…

fe34247

…sum tests with BERT pattern, improve docstrings and skip logic.

examples: add deterministic and strict_determinism flags and docs to …

c374dfe

…BERT, GPT-2, LSTM, CNN, LLaMA examples

Deterministic fingerprints: replace checksum with Loss+ActMean across…

614f96c

… models; update tests

Deterministic training + reproducible logging: align GPT-2/LLaMA/LSTM…

689dc44

…/CNN/BERT/Mixtral with periodic fingerprints, per-step loss capture, TF32 off, SDPA math kernel; add model_log_utils; update examples and tests, add env gating for cuBLAS.

Adding flag: Checck-frequency

33c3f6a

Add Check frequency flag to tests

f35e98b

Code refactor: Move enable_determinism to base class, add a consolida…

dd7fcbe

…ted example file, remove redundant code

Code refactor: Add a new test folder to remove redundant code, remove…

d439395

… unnecessary code

Code refactor: Move loss and ActMean logging to base class from indiv…

da9c85a

…idual model classes

Code refactor: Move _benchmark() method to base class

2635aad

Code refactor: Add method _finalize_periodic_logging to base class to…

4a21990

… reduce redundant code

Code cleanup: Remove unnecessary imports

ddd3f23

Code cleanup: Remove unnecessary imports

a9cb452

Code cleanup: Remove unnecessary imports

52c5516

Code cleanup: Remove unnecessary imports

6623f59

Tescase addition: Add Failure testcase, renameflag

8853c21

Delete extra lines

14be806

Add Docstrings, align imports, add assertions messages

8cd1c19

Lint Checks

99bdc16

Lint Checks

4bc0445

Lint Checks

2c8d856

Aishwarya-Tonpe requested a review from a team as a code owner August 28, 2025 17:41

github-advanced-security bot found potential problems Aug 28, 2025

View reviewed changes

tests/benchmarks/model_benchmarks/test_pytorch_determinism_all.py Fixed Show fixed Hide fixed

Failed check: Resolving failed pipeline check for creating temp file …

d8d9ca0

…in the test file

Pipeline failure fixes : Fixing Lint failures on test, example and ba…

8bcd801

…se file

root and others added 30 commits December 8, 2025 22:05

Removing check_frequency parameter from is_finished method in train a…

e4d2f5e

…nd inference steps

Comments resolve : Removing check_frequency assignment to the variable

d0bfd38

Update superbench/benchmarks/model_benchmarks/pytorch_base.py

197007a

Co-authored-by: Copilot <[email protected]>

Update tests/benchmarks/model_benchmarks/test_pytorch_determinism_all.py

4724815

Co-authored-by: Copilot <[email protected]>

Update superbench/benchmarks/model_benchmarks/pytorch_base.py

fdc82ad

Co-authored-by: Copilot <[email protected]>

Logic change to add metrics to resuls_summary file, Logic change to m…

373fdf3

…etadata overriding

Moving CUBLAS_WORKSPACE_CONFIG=:4096:8 to the code base so that it do…

11e945e

…es not need to be set explicitly before running the benchmarks

Renaming --deterministic -> --enable-determinism

4911580

Comments resolve: minor deletions

67fca5c

Update superbench/benchmarks/model_benchmarks/pytorch_base.py

ce18856

Co-authored-by: Copilot <[email protected]>

Update superbench/benchmarks/model_benchmarks/pytorch_mixtral_impl.py

31f46ad

Co-authored-by: Copilot <[email protected]>

Update docs/user-tutorial/benchmarks/model-benchmarks.md

c5895b1

Co-authored-by: Copilot <[email protected]>

Refactoring the code: Moving utility functions to model_log_utils

e457b83

Merge branch 'aishwaryatonpe/deterministic-training' of https://githu…

02d568a

…b.com/microsoft/superbenchmark into aishwaryatonpe/deterministic-training

Updating the user docs

a249916

Updating the test files and fixing lint errors

039b17e

Lint error fixes

a26518c

Pipeline erros resolve : Link errors, function complex error

c8abf0c

Resetting the env var cause of failing testcases in the pipeline, tes…

2f5493a

…tcases pass on local

Resolving pipelines errors

8398f51

Resolving pipelines errors

7c5405a

Resolving pipeline issues

6b51a18

Adding a new test file to cover the code logic in the model_utils file

c8ca973

Resolving pipeline issues

7f6bfeb

Resolving pipeline issues

205934e

resolving pipeline issues

3e996f2

Resolving pipeline failures

ea9f6b2

Fix pipeline issues

3b31c6a

Minor change

4384412

Merge branch 'main' into aishwaryatonpe/deterministic-training

b5967f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark: Model benchmark - deterministic training support #731

Benchmark: Model benchmark - deterministic training support #731

Uh oh!

Aishwarya-Tonpe commented Aug 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Aishwarya-Tonpe commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Benchmark: Model benchmark - deterministic training support #731

Are you sure you want to change the base?

Benchmark: Model benchmark - deterministic training support #731

Uh oh!

Conversation

Aishwarya-Tonpe commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Aishwarya-Tonpe commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Aishwarya-Tonpe commented Aug 28, 2025 •

edited

Loading