Skip to content

CI: Refactor LLVM setup steps and use cache #1242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 5, 2022

Conversation

alvinhochun
Copy link
Contributor

I extracted the LLVM setup steps into composite actions and enabled caching the LLVM toolchains. The clang-cl jobs are perhaps slightly faster now but not a significant difference.

@sylveon
Copy link
Contributor

sylveon commented Dec 4, 2022

Why are we not using the preinstalled LLVM that comes with VS (and its platform toolset)? That'll avoid the download altogether and the vs version would be what most people who want to use clang-cl end up using.

@alvinhochun
Copy link
Contributor Author

I tend to use the latest release available to try things so that's what I went with initially. I suppose it may make sense to add CI jobs to test the LLVM build that comes with VS in addition to the latest upstream release, but I wouldn't replace the current ones entirely.

@kennykerr kennykerr merged commit d86323d into microsoft:master Dec 5, 2022
@kennykerr
Copy link
Collaborator

These LLVM tests seems to fail pretty regularly. Any ideas?

https://github.com/microsoft/cppwinrt/actions/runs/3620984131/jobs/6104646572

@alvinhochun
Copy link
Contributor Author

Not really. I don't know if this is true, but to me it feels like this specific failure only started happening quite recently on the runner. I have got it a lot in my fork too. I tried to debug it locally but I can't get it to happen at all on my system with the llvm-mingw build.

I do manage to get it on my local GCC build (I made a few hacks to get it to build), roughly 1 out of 20 tries? It might help me figure this out.

@kennykerr
Copy link
Collaborator

@oldnewthing looks like that line // Wait indefinitely if a debugger is present, to make it easier to debug this test. comes from your async propagation work. Any idea why this might fail?

@oldnewthing
Copy link
Member

@kennykerr I don't see anything compiler-specific in that test. The test logs don't really say anything. I can't even find where it says which test failed!

@kennykerr
Copy link
Collaborator

kennykerr commented Dec 5, 2022

The log is kinda hard to browse online. 😒 Here's the relevant bit. Agreed it shouldn't be compiler-specific.

1: -------------------------------------------------------------------------------
1: async_propagate_cancel
1: -------------------------------------------------------------------------------
1: D:\a\cppwinrt\cppwinrt\test\test\async_propagate_cancel.cpp:1[35](https://github.com/microsoft/cppwinrt/actions/runs/3620984131/jobs/6104646572#step:8:36)
1: ...............................................................................
1: 
1: D:\a\cppwinrt\cppwinrt\test\test\async_propagate_cancel.cpp:123: FAILED:
1:   REQUIRE( WaitForSingleObject(completed.get(), IsDebuggerPresent() ? 0xffffffff : 1000) == ((((DWORD)0x00000000)) + 0) )
1: with expansion:
1:   258 (0x102) == 0

@kennykerr
Copy link
Collaborator

A re-run made the problem go away... 🤷‍♂️

@sylveon
Copy link
Contributor

sylveon commented Dec 5, 2022

It's often the same test. Did this happen before my change removing async hooks?

@kennykerr
Copy link
Collaborator

That's possible - I'm not sure.

@alvinhochun
Copy link
Contributor Author

From rerunning some old CI jobs in my fork, it seems the failure started happening after enabling ASan in the llvm-mingw Debug builds (#1232). But I do test this config locally since that PR and I haven't got this failure at all. So there seems to be other factors at play, or perhaps it is an odd race condition which happens to trigger very often on the GHA runner?

If the failure gets too annoying we can mark this job to continue-on-error...

@alvinhochun
Copy link
Contributor Author

alvinhochun commented Dec 6, 2022

Hmm, actually it seems to be also happening to release builds, but since there were no test for llvm-mingw release builds before #1232 I don't have any more data points.

@alvinhochun
Copy link
Contributor Author

I am pretty sure the test is failing at async_propagate_cancel in the call Check([] { return ActionAction(10); });. What seems to be happening (and this is only my speculation) is that if the ActionAction coroutine runs fast enough that async.Cancel(); is called after the coroutine has progressed a bit, the completion handler does not get called and the main thread is stuck at WaitForSingleObject.

In a change (alvinhochun@c34c31a) I added a delay before calling async.Cancel(); and it fails in both MSVC and Clang (MinGW): https://github.com/alvinhochun/cppwinrt/actions/runs/3630090254 (Note: Clang-cl does not run this test because it segfaults.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants