Skip to content

Cannot run parallel inference with DDP #9687

@thomassajot

Description

@thomassajot

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Validation, Detection

Bug

I am trying to make predictions in parallel using multiple GPUs in order to speed up inference on large datasets.
From what I gathered, the best way to go about it with Pytorch is to use torch.nn.DataParallel.
However, the model first gets created in cuda:0 then is copied over to the desired gpus. This overloads cuda:0 and if not (when the batch size is small) then the same model is present over multiple gpus. I then get the following exception:
RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

See full error:

YOLOv5 🚀 v6.2-145-gf8b7463 Python-3.9.13 torch-1.12.1+cu102 CUDA:4 (NVIDIA GeForce RTX 2080 Ti, 11019MiB)

Fusing layers...
Model summary: 416 layers, 140038156 parameters, 0 gradients, 208.0 GFLOPs
Adding AutoShape...
Traceback (most recent call last):
  File "/mnt/remote/data/users/thomasssajot/yolov5/notebooks/generate_classification_results.py", line 152, in <module>
    main(device=2)
  File "/mnt/remote/data/users/thomasssajot/yolov5/notebooks/generate_classification_results.py", line 136, in main
    model = get_model(model_path).to(f'cuda:{device}')
  File "/home/thomassajot/miniconda3/envs/yolov5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 927, in to
    return self._apply(convert)
  File "/mnt/remote/data/users/thomasssajot/yolov5/models/common.py", line 621, in _apply
    self = super()._apply(fn)
  File "/home/thomassajot/miniconda3/envs/yolov5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/thomassajot/miniconda3/envs/yolov5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/mnt/remote/data/users/thomasssajot/yolov5/models/yolo.py", line 155, in _apply
    self = super()._apply(fn)
  File "/home/thomassajot/miniconda3/envs/yolov5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/thomassajot/miniconda3/envs/yolov5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/thomassajot/miniconda3/envs/yolov5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/thomassajot/miniconda3/envs/yolov5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/home/thomassajot/miniconda3/envs/yolov5/lib/python3.9/site-packages/torch/nn/modules/module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Environment

PyTorch version: 1.12.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu118.04) 9.4.0
Clang version: 13.0.1-++20220120110844+75e33f71c2da-1
exp1~20220120230854.66
CMake version: version 3.10.2
Libc version: glibc-2.27

Minimal Reproducible Example

import torch 
from torch.utils.data import DataLoader
from tqdm import tqdm

def get_model(path):
    model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
    model.eval()
    return model

def get_image_files():
    images= 'path/to/image.jpeg'
    return [image] * 64

def main():
    images = get_image_files()
    model = get_model()
    net = torch.nn.DataParallel(model, device_ids=[0, 1])

    loader = DataLoader(dataset=images[:64 * 4], batch_size=4, shuffle=False, num_workers=8) 

    with torch.no_grad():
        for batch in tqdm(loader, ncols=140, desc=f'Predictions'):
            res = net(batch, size=1280)


if __name__ == "__main__":
    main()

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    TODOHigh priority itemsbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions