Skip to content

Commit 6a684cb

Browse files
xw285cornellfacebook-github-bot
authored andcommitted
Return if no data to allreduce (pytorch#3586)
Summary: Pull Request resolved: pytorch#3586 X-link: facebookresearch/FBGEMM#669 When the input tensor is empty, just return. Otherwise the num_thread will be 0 and fail to launch cuda kernels. Reviewed By: feikou, jianyuh Differential Revision: D68318641 fbshipit-source-id: ff1c0c401fc4884cef9ee71fdcccaa6d68e1bf80
1 parent b858408 commit 6a684cb

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

fbgemm_gpu/experimental/gen_ai/src/comm/car.cu

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -480,6 +480,10 @@ void one_shot_car_allreduce(
480480
TORCH_CHECK(y.numel() % 8 == 0);
481481
TORCH_CHECK(y.numel() < kMaxCAR);
482482
const auto N = y.numel();
483+
if (N == 0) {
484+
// no data to allreduce, return
485+
return;
486+
}
483487
if (z) {
484488
TORCH_CHECK(z->numel() == y.numel());
485489
}

fbgemm_gpu/experimental/gen_ai/test/comm/multi_gpu_car_test.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ def _run_oneshot_car_stress_inner(path: str) -> None:
246246
torch.distributed.barrier()
247247

248248
ITER = 1000
249-
for idx, N in enumerate(np.logspace(4, 24, num=20, base=2).tolist()):
249+
for idx, N in enumerate([0] + np.logspace(4, 24, num=20, base=2).tolist()):
250250
N = int(N)
251251

252252
def round_up(a: int, b: int) -> int:

0 commit comments

Comments
 (0)