Relationship between OMP_NUM_THREADS and no. nodes/CPUs #3904

BramVanroy · 2023-07-07T23:51:47Z

BramVanroy
Jul 7, 2023

Usually, I set OMP_NUM_THREADS corresponding to the number of CPUs I have available per GPU. So if I have a cluster with nodes with 4 GPUs and 32 CPU cores, I'd set OMP_NUM_THREADS=8 (32/4), with the assumption that every GPU is using a dedicated process, which can then make use of their 8 designated threads.

But does Deepspeed work like this, too? Does Deepspeed launch one process per GPU, or one process per node? In other words, in the example above, should OMP_NUM_THREADS be 8 (one process per GPU) or should it be the full 32 (one process per node)?

mzamini92 · 2023-07-10T13:34:32Z

mzamini92
Jul 10, 2023

Deepspeed, typically launches one process per GPU, not per node. Each process then handles one GPU and its associated computation. Therefore, in your example where you have a cluster with nodes having 4 GPUs and 32 CPU cores, you would set OMP_NUM_THREADS=8 to correspond to the number of CPUs you have available per GPU.

Setting OMP_NUM_THREADS=8 means that each GPU will have a dedicated process utilizing 8 threads. This configuration aligns with the assumption that every GPU is assigned to a separate process, allowing each process to make use of its designated 8 threads effectively.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Relationship between OMP_NUM_THREADS and no. nodes/CPUs #3904

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Relationship between OMP_NUM_THREADS and no. nodes/CPUs #3904

Uh oh!

BramVanroy Jul 7, 2023

Replies: 1 comment

Uh oh!

mzamini92 Jul 10, 2023

BramVanroy
Jul 7, 2023

mzamini92
Jul 10, 2023