Skip to content

Commit a9c478c

Browse files
sryapfacebook-github-bot
authored andcommitted
Add TensorAccessor with memcheck (pytorch#1602)
Summary: Pull Request resolved: pytorch#1602 Illegal memory access is a common problem during GPU kernel execution. The FBGEMM GPU relies on PyTorch's `C10_CUDA_KERNEL_LAUNCH_CHECK()` and the CUDA runtime to detect such problems and throw an error. However, there are a few known issues with this approach. (1) `C10_CUDA_KERNEL_LAUNCH_CHECK()` detects errors on the host. However, due to the non-blocking, asynchronous nature of GPU kernel execution, the error is caught on the host at a later point than where the problematic kernel was launched. This can cause the stack trace to be inaccurate and make debugging more difficult. Although the issue can be fixed by running the code with `CUDA_LAUNCH_BLOCKING=1`, this can change the state of the execution and cause Heisenbugs. (2) Not all illegal memory accesses are caught by the runtime. This means that the system may not always throw an error when illegal memory access occurs. (3) Although the runtime throws an error for illegal memory access, it is difficult to pinpoint the specific kernel and memory buffer/address that is causing the problem. For all the aforementioned reasons, we attempt to catch and throw an error as soon as possible in the kernel when illegal memory accesses occur in FBGEMM GPU. We introduce the `FBGEMM_GPU_MEMCHECK` flag to enable memory checking during compile time. We copy PyTorch's `TensorAccessor.h` into the FBGEMM GPU and extend it to check every memory access through the `PackedTensorAccessor`. If an invalid memory access occurs, we throw an error using `CUDA_KERNEL_ASSERT`. The error message includes the name of the tensor and the kernel that caused the problem. If `FBGEMM_GPU_MEMCHECK` is enabled, FBGEMM operators will use `fbgemm::PackedTensorAccessor`. Otherwise, they will use `at::PackedTensorAccessor` `FBGEMM_GPU_MEMCHECK` integration in FBGEMM ops will be done in subsequent diffs Differential Revision: D43421838 fbshipit-source-id: 3281ba97044e1f5fefcff3dc3b013fb19c265d38
1 parent 9b4952f commit a9c478c

File tree

2 files changed

+575
-1
lines changed

2 files changed

+575
-1
lines changed

fbgemm_gpu/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,8 @@ set(codegen_dependencies
167167
${CMAKE_CURRENT_SOURCE_DIR}/include/fbgemm_gpu/quantize_ops_utils.h
168168
${CMAKE_CURRENT_SOURCE_DIR}/include/fbgemm_gpu/split_embeddings_utils.cuh
169169
${CMAKE_CURRENT_SOURCE_DIR}/include/fbgemm_gpu/sparse_ops_utils.h
170-
${CMAKE_CURRENT_SOURCE_DIR}/include/fbgemm_gpu/split_embeddings_cache_cuda.cuh)
170+
${CMAKE_CURRENT_SOURCE_DIR}/include/fbgemm_gpu/split_embeddings_cache_cuda.cuh
171+
${CMAKE_CURRENT_SOURCE_DIR}/include/fbgemm_gpu/fbgemm_tensor_accessor.h)
171172

172173
if(USE_ROCM)
173174
message(STATUS "${PYTHON_EXECUTABLE}" "${CMAKE_CODEGEN_DIR}/embedding_backward_code_generator.py" "--opensource --is_rocm")

0 commit comments

Comments
 (0)