Skip to content

[CI][Test] Fix uvm_test.py when a machine has only 1 GPU #1549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

shintaro-iwasaki
Copy link
Contributor

The following skipIf does not work correctly when a machine has only one GPU.

@unittest.skipIf(*gpu_unavailable or torch.cuda.device_count() < 2)
def test_uvm_to_device(self, sizes: List[int], uvm_op) -> None:
  [...]

This patch corrects the condition.

Details

First, *gpu_unavailable is defined as follows.

gpu_unavailable: Tuple[bool, str] = (
    not torch.cuda.is_available() or torch.cuda.device_count() == 0,
    "CUDA is not available or no GPUs detected",
)

So the skipIf is expanded as follows when there is only one CUDA device.

@unittest.skipIf(*gpu_unavailable or torch.cuda.device_count() < 2)
->
@unittest.skipIf((False, "CUDA is not ...") or True)

Because (False, "abc") or True is (False, "abc") in Python,

@unittest.skipIf((False, "CUDA is not ...") or True)
->
@unittest.skipIf(False, "CUDA is not ...")

It is False, so this unit test is not skipped. This UVM test seems failing occasionally because the machine does not have two GPUs, which annoys FBGEMM developers.

@shintaro-iwasaki shintaro-iwasaki added the bug Something isn't working label Jan 18, 2023
@netlify
Copy link

netlify bot commented Jan 18, 2023

Deploy Preview for pytorch-fbgemm-docs canceled.

Name Link
🔨 Latest commit 138fc7b
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/63c86937e0dca700094a05e3

@shintaro-iwasaki shintaro-iwasaki force-pushed the siwasaki/pr/fix_uvm_test_error branch from 66d1b3c to 138fc7b Compare January 18, 2023 21:48
@shintaro-iwasaki
Copy link
Contributor Author

Now we see fbgemm_gpu/test/uvm_test.py::UvmTest::test_uvm_to_device SKIPPED on a single-GPU machine.

@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@shintaro-iwasaki merged this pull request in 0237a8a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla signed Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants