Skip to content

Multi-GPU support #42

Closed
Closed
@soumith

Description

@soumith

MultiGPU support has been implemented in cutorch (and by extension all torch cuda libraries like cunn, cudnn etc.).

  • Switch the device on the fly with cutorch.setDevice(devID)
  • All cuda calls are asynchronous, and can be synchronized with cutorch.synchronize()

Example usage for tensors:

-- Let us do matrix addition for matrices sitting on two different GPUs
cutorch.setDevice(1)
matrix1 = torch.CudaTensor(10):fill(1)
print(matrix1) -- printing is a synchronous call, so you dont have to explicitly call cutorch.synchronize()
cutorch.setDevice(2)
matrix2 = torch.CudaTensor(10):fill(2)
print(matrix2) 
matrix2:add(matrix1) -- matrix1 is seamlessly copied onto GPU2 and added to matrix2
print(matrix2)

if you want to do data-parallel training of neural nets (including convnets), your training loop can run like this:

For each mini-batch:

1. load data (preferably using multiple threads, for example using [threads-ffi](https://github.com/torch/threads-ffi))
2. loop over GPUs (the loop below will be completely anynchronous, so will run parallely)
  2.1. model[gpuX]:forward
  2.2. criterion[gpuX]:forward
  2.3. criterion[gpuX]:backward
  2.4. model[gpuX]:backward
3. cutorch.synchronize()
4. accumulate GPUx's gradParameters to GPU1's gradParameters
5. do SGD on GPU1
6. copy back GPU1's parameters to GPUx
7. cutorch.synchronize() and print accuracy etc.

Loop back to 1 for next mini-batch

Also, to train ConvNets using multiple GPUs, I recommend using CuDNN for the convolution layers, as I've tested that they are completely asynchronous (meaning that the processing runs parallely on multiple GPUs)

Comments below describe the technical details of changes made. If you just want to use Multi-GPU, you can stop reading now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions