Closed
Description
MultiGPU support has been implemented in cutorch (and by extension all torch cuda libraries like cunn, cudnn etc.).
- Switch the device on the fly with cutorch.setDevice(devID)
- All cuda calls are asynchronous, and can be synchronized with cutorch.synchronize()
Example usage for tensors:
-- Let us do matrix addition for matrices sitting on two different GPUs
cutorch.setDevice(1)
matrix1 = torch.CudaTensor(10):fill(1)
print(matrix1) -- printing is a synchronous call, so you dont have to explicitly call cutorch.synchronize()
cutorch.setDevice(2)
matrix2 = torch.CudaTensor(10):fill(2)
print(matrix2)
matrix2:add(matrix1) -- matrix1 is seamlessly copied onto GPU2 and added to matrix2
print(matrix2)
if you want to do data-parallel training of neural nets (including convnets), your training loop can run like this:
For each mini-batch:
1. load data (preferably using multiple threads, for example using [threads-ffi](https://github.com/torch/threads-ffi))
2. loop over GPUs (the loop below will be completely anynchronous, so will run parallely)
2.1. model[gpuX]:forward
2.2. criterion[gpuX]:forward
2.3. criterion[gpuX]:backward
2.4. model[gpuX]:backward
3. cutorch.synchronize()
4. accumulate GPUx's gradParameters to GPU1's gradParameters
5. do SGD on GPU1
6. copy back GPU1's parameters to GPUx
7. cutorch.synchronize() and print accuracy etc.
Loop back to 1 for next mini-batch
Also, to train ConvNets using multiple GPUs, I recommend using CuDNN for the convolution layers, as I've tested that they are completely asynchronous (meaning that the processing runs parallely on multiple GPUs)
Comments below describe the technical details of changes made. If you just want to use Multi-GPU, you can stop reading now.
Metadata
Metadata
Assignees
Labels
No labels