Skip to content

Try to avoid random crash with disconnected.cpp test #1226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

alvinhochun
Copy link
Contributor

There are some test cases which tests throwing an hresult_error from the completed handler. This relies on winrt::impl::invoke catching the exception and handling it using winrt::to_hresult. In some test cases, the completed handler may run on a worker thread, so the test tries to wait for the handler to be called using an event, but it does not wait until winrt::impl::invoke has finished handling the exception. An extremely rare race condition may happen:

  1. The completed handler on a worker thread sets the event.
  2. The test case, having finished waiting for the event on the main thread, leaves the test case.
  3. Another test case, custom_error, starts on the main thread and sets winrt_to_hresult_handler to a custom handler
  4. The completed handler, which is still running on the worker thread, throws the hresult_error exception.
  5. winrt::impl::invoke catches the exception, then tries to handle it with winrt::to_hresult.
  6. winrt::to_hresult notices the custom winrt_to_hresult_handler from the custom_error test and calls it, instead of going through the usual code flow that is expected by the original test case.
  7. The winrt_to_hresult_handler doesn't know how to handle hresult_error, so it ends up unhandled and terminates the program.

This is a real crash I have got when testing a mingw build with some tests excluded, which triggered this exact sequence of events. In theory this can also happen when testing with --order rand to randomize test order if you are very unlucky.

I cannot think of a way to really wait for the completed handler to finish, so I added a 500ms sleep at the end of the test to hopefully minimize the chance of this happening again.

There are some test cases which tests throwing an hresult_error from the
completed handler. This relies on `winrt::impl::invoke` catching the
exception and handling it using `winrt::to_hresult`. In some test cases,
the completed handler may run on a worker thread, so the test tries to
wait for the handler to be called using an event, but it does not wait
until `winrt::impl::invoke` has finished handling the exception. An
extremely rare race condition may happen:

1. The completed handler on a worker thread sets the event.
2. The test case, having finished waiting for the event on the main
   thread, leaves the test case.
3. Another test case, `custom_error`, starts on the main thread and sets
   `winrt_to_hresult_handler` to a custom handler
4. The completed handler, which is still running on the worker thread,
   throws the `hresult_error` exception.
5. `winrt::impl::invoke` catches the exception, then tries to handle it
   with `winrt::to_hresult`.
6. `winrt::to_hresult` notices the custom `winrt_to_hresult_handler`
   from the `custom_error` test and calls it, instead of going through
   the usual code flow that is expected by the original test case.
7. The `winrt_to_hresult_handler` doesn't know how to handle
   `hresult_error`, so it ends up unhandled and terminates the program.

This is a real crash I have got when testing a mingw build with some
tests excluded, which triggered this exact sequence of events. In theory
this can also happen when testing with `--order rand` to randomize test
order if you are very unlucky.

I cannot think of a way to really wait for the completed handler to
finish, so I added a 500ms sleep at the end of the test to hopefully
minimize the chance of this happening again.
@@ -105,6 +105,8 @@ TEST_CASE("disconnected,handler,3")
});

WaitForSingleObject(signal.get(), INFINITE);
// Give some time for to_hresult() to complete.
Sleep(500);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we build this buffer into the test that fiddles with winrt_to_hresult_handler rather than the test(s) that observe the fiddling? I think there's only one place where this is tested (custom_error.cpp).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting this in the disconnected test itself is better because it makes sure (or at least tries to) that any code related to this test running on the worker thread has really finished before leaving the test scope. In case it actually causes std::terminate, Catch2 can report that the abnormal termination happens during the correct test, instead of making it look like another test which happens to be running afterwards caused it. (This actually did happen to these tests before PR #1225.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

@kennykerr
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@kennykerr
Copy link
Collaborator

Not sure why Azure won't kick off an internal build on PRs anymore. Anyway, I'm pretty confident the GitHub validation is sound and the Azure build still runs on master, so I'll merge this.

@kennykerr kennykerr merged commit b79565c into microsoft:master Nov 8, 2022
@alvinhochun alvinhochun deleted the alvin/disconnected-test-hack branch December 11, 2022 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants