Is Autotuning compatible with AutoTP training? #7315

cynricfu · 2025-05-28T10:07:14Z

cynricfu
May 28, 2025

Can I use the Autotuning feature while enabling AutoTP in the configuration file?

I previously attempted to use both AutoTP and autotuning together. It appeared to work when I set "mp_size": 1 in the autotuning configuration. However, it failed when mp_size was set to match autotp_size. The error seemed related to a train batch size mismatch.

Even when it did run, the results were questionable. For instance, with the same train_micro_batch_size_per_gpu, I observed higher throughput when "autotp_size": 2" compared to no tensor parallelism at all, which seemed counterintuitive.

Additionally, it looks like the initial model profiling phase in autotuning uses ZeRO Stage 3 to estimate the minimum memory requirement. But AutoTP isn’t compatible with ZeRO Stage 3, right? I worked around this limitation by copying the profiling results from a configuration without TP, tricking autotuning into thinking the model had successfully completed profiling. However, this workaround isn’t ideal.

cynricfu · 2025-05-28T10:20:17Z

cynricfu
May 28, 2025
Author

@inkcherry @delock @tjruwase @loadams

0 replies

delock · 2025-05-30T15:36:13Z

delock
May 30, 2025
Collaborator

I didn't used Autotuning before but it looks like autotuning focus on configuration file so it should support autotp. From your description, the problem should be caused by 'mp_size' not compatible with 'autotp_size'. @inkcherry can confirm whether this is true.

For the second question, is it possible to set autotp_size to 1 during the profile stage? If the purpose is find minimum memory usage, then zero3 without autotp should be a pretty good estimation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is Autotuning compatible with AutoTP training? #7315

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is Autotuning compatible with AutoTP training? #7315

Uh oh!

cynricfu May 28, 2025

Replies: 2 comments

Uh oh!

cynricfu May 28, 2025 Author

Uh oh!

delock May 30, 2025 Collaborator

cynricfu
May 28, 2025

cynricfu
May 28, 2025
Author

delock
May 30, 2025
Collaborator