Try to avoid random crash with disconnected.cpp test #1226

alvinhochun · 2022-11-08T15:39:33Z

There are some test cases which tests throwing an hresult_error from the completed handler. This relies on winrt::impl::invoke catching the exception and handling it using winrt::to_hresult. In some test cases, the completed handler may run on a worker thread, so the test tries to wait for the handler to be called using an event, but it does not wait until winrt::impl::invoke has finished handling the exception. An extremely rare race condition may happen:

The completed handler on a worker thread sets the event.
The test case, having finished waiting for the event on the main thread, leaves the test case.
Another test case, custom_error, starts on the main thread and sets winrt_to_hresult_handler to a custom handler
The completed handler, which is still running on the worker thread, throws the hresult_error exception.
winrt::impl::invoke catches the exception, then tries to handle it with winrt::to_hresult.
winrt::to_hresult notices the custom winrt_to_hresult_handler from the custom_error test and calls it, instead of going through the usual code flow that is expected by the original test case.
The winrt_to_hresult_handler doesn't know how to handle hresult_error, so it ends up unhandled and terminates the program.

This is a real crash I have got when testing a mingw build with some tests excluded, which triggered this exact sequence of events. In theory this can also happen when testing with --order rand to randomize test order if you are very unlucky.

I cannot think of a way to really wait for the completed handler to finish, so I added a 500ms sleep at the end of the test to hopefully minimize the chance of this happening again.

There are some test cases which tests throwing an hresult_error from the completed handler. This relies on `winrt::impl::invoke` catching the exception and handling it using `winrt::to_hresult`. In some test cases, the completed handler may run on a worker thread, so the test tries to wait for the handler to be called using an event, but it does not wait until `winrt::impl::invoke` has finished handling the exception. An extremely rare race condition may happen: 1. The completed handler on a worker thread sets the event. 2. The test case, having finished waiting for the event on the main thread, leaves the test case. 3. Another test case, `custom_error`, starts on the main thread and sets `winrt_to_hresult_handler` to a custom handler 4. The completed handler, which is still running on the worker thread, throws the `hresult_error` exception. 5. `winrt::impl::invoke` catches the exception, then tries to handle it with `winrt::to_hresult`. 6. `winrt::to_hresult` notices the custom `winrt_to_hresult_handler` from the `custom_error` test and calls it, instead of going through the usual code flow that is expected by the original test case. 7. The `winrt_to_hresult_handler` doesn't know how to handle `hresult_error`, so it ends up unhandled and terminates the program. This is a real crash I have got when testing a mingw build with some tests excluded, which triggered this exact sequence of events. In theory this can also happen when testing with `--order rand` to randomize test order if you are very unlucky. I cannot think of a way to really wait for the completed handler to finish, so I added a 500ms sleep at the end of the test to hopefully minimize the chance of this happening again.

kennykerr · 2022-11-08T16:18:32Z

test/test/disconnected.cpp

@@ -105,6 +105,8 @@ TEST_CASE("disconnected,handler,3")
        });

    WaitForSingleObject(signal.get(), INFINITE);
+    // Give some time for to_hresult() to complete.
+    Sleep(500);


Shouldn't we build this buffer into the test that fiddles with winrt_to_hresult_handler rather than the test(s) that observe the fiddling? I think there's only one place where this is tested (custom_error.cpp).

I think putting this in the disconnected test itself is better because it makes sure (or at least tries to) that any code related to this test running on the worker thread has really finished before leaving the test scope. In case it actually causes std::terminate, Catch2 can report that the abnormal termination happens during the correct test, instead of making it look like another test which happens to be running afterwards caused it. (This actually did happen to these tests before PR #1225.)

Fair enough.

kennykerr · 2022-11-08T17:02:02Z

/azp run

azure-pipelines · 2022-11-08T17:02:09Z

No pipelines are associated with this pull request.

kennykerr · 2022-11-08T17:03:51Z

Not sure why Azure won't kick off an internal build on PRs anymore. Anyway, I'm pretty confident the GitHub validation is sound and the Azure build still runs on master, so I'll merge this.

kennykerr reviewed Nov 8, 2022

View reviewed changes

kennykerr approved these changes Nov 8, 2022

View reviewed changes

kennykerr merged commit b79565c into microsoft:master Nov 8, 2022

alvinhochun deleted the alvin/disconnected-test-hack branch December 11, 2022 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Try to avoid random crash with disconnected.cpp test #1226

Try to avoid random crash with disconnected.cpp test #1226

Uh oh!

alvinhochun commented Nov 8, 2022

Uh oh!

kennykerr Nov 8, 2022

Uh oh!

alvinhochun Nov 8, 2022

Uh oh!

kennykerr Nov 8, 2022

Uh oh!

kennykerr commented Nov 8, 2022

Uh oh!

azure-pipelines bot commented Nov 8, 2022

Uh oh!

kennykerr commented Nov 8, 2022

Uh oh!

Uh oh!

Try to avoid random crash with disconnected.cpp test #1226

Try to avoid random crash with disconnected.cpp test #1226

Uh oh!

Conversation

alvinhochun commented Nov 8, 2022

Uh oh!

kennykerr Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

alvinhochun Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

kennykerr Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

kennykerr commented Nov 8, 2022

Uh oh!

azure-pipelines bot commented Nov 8, 2022

Uh oh!

kennykerr commented Nov 8, 2022

Uh oh!

Uh oh!