Relationship between OMP_NUM_THREADS and no. nodes/CPUs #3904
Unanswered
BramVanroy
asked this question in
Q&A
Replies: 1 comment
-
Deepspeed, typically launches one process per GPU, not per node. Each process then handles one GPU and its associated computation. Therefore, in your example where you have a cluster with nodes having 4 GPUs and 32 CPU cores, you would set OMP_NUM_THREADS=8 to correspond to the number of CPUs you have available per GPU. Setting OMP_NUM_THREADS=8 means that each GPU will have a dedicated process utilizing 8 threads. This configuration aligns with the assumption that every GPU is assigned to a separate process, allowing each process to make use of its designated 8 threads effectively. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Usually, I set
OMP_NUM_THREADS
corresponding to the number of CPUs I have available per GPU. So if I have a cluster with nodes with 4 GPUs and 32 CPU cores, I'd setOMP_NUM_THREADS=8
(32/4), with the assumption that every GPU is using a dedicated process, which can then make use of their 8 designated threads.But does Deepspeed work like this, too? Does Deepspeed launch one process per GPU, or one process per node? In other words, in the example above, should
OMP_NUM_THREADS
be8
(one process per GPU) or should it be the full32
(one process per node)?Beta Was this translation helpful? Give feedback.
All reactions