Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
I noticed there is a all_gather step in
_post_step
function of stage3. The all-gather is used, instead of all-reduce, is it because the gradients of the persistent parameter is synchronized via reduce-scatter in the backward pass?https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/zero/stage3.py#L1707
Beta Was this translation helpful? Give feedback.
All reactions