Replies: 2 comments
-
Beta Was this translation helpful? Give feedback.
0 replies
-
I didn't used Autotuning before but it looks like autotuning focus on configuration file so it should support autotp. From your description, the problem should be caused by 'mp_size' not compatible with 'autotp_size'. @inkcherry can confirm whether this is true. For the second question, is it possible to set autotp_size to 1 during the profile stage? If the purpose is find minimum memory usage, then zero3 without autotp should be a pretty good estimation. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Can I use the Autotuning feature while enabling AutoTP in the configuration file?
I previously attempted to use both AutoTP and autotuning together. It appeared to work when I set
"mp_size": 1
in the autotuning configuration. However, it failed whenmp_size
was set to matchautotp_size
. The error seemed related to a train batch size mismatch.Even when it did run, the results were questionable. For instance, with the same
train_micro_batch_size_per_gpu
, I observed higher throughput when"autotp_size": 2"
compared to no tensor parallelism at all, which seemed counterintuitive.Additionally, it looks like the initial model profiling phase in autotuning uses ZeRO Stage 3 to estimate the minimum memory requirement. But AutoTP isn’t compatible with ZeRO Stage 3, right? I worked around this limitation by copying the profiling results from a configuration without TP, tricking autotuning into thinking the model had successfully completed profiling. However, this workaround isn’t ideal.
Beta Was this translation helpful? Give feedback.
All reactions