Description
Description
I maintain a Python software project that installs itself into a conda
environment.
For the last several years, we have had a working test suite for WSL using the windows-2019
runner. (Last passing run and associated workflow file
Due to the windows-2019
brownout, we switched over to windows-2022
in this PR: spinalcordtoolbox/spinalcordtoolbox#4907 (comment)
# before
Linux fv-az1346-332 4.4.0-17763-Microsoft #2268-Microsoft Thu Oct 07 16:36:00 PST 2021 x86_64 x86_64 x86_64 GNU/Linux
# after
Linux fv-az1212-522 4.4.0-20348-Microsoft #2849-Microsoft Thu Nov 01 17:32:00 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
However, after merging this change, we immediately started noticing that runs would mysteriously hang / disconnect / cancel with a number of strange errors/annotations:
Received request to deprovision: The request was cancelled by the remote provider.
**Run Vampire/setup-wsl@v3**
Failed to restore: getCacheEntry failed: connect ETIMEDOUT 20.246.192.124:443
**Run Vampire/setup-wsl@v3**
Failed to save: reserveCache failed: connect ETIMEDOUT 20.246.192.124:443
The hosted runner encountered an error while running your job. (Error Type: Disconnect).
Failed to restore: getCacheEntry failed: Cache service responded with 503
Failed to save: reserveCache failed: Cache service responded with 503
Even after updating multiple settings (setup-wsl@v3
-> setup-wsl@v5
, WSL2 -> WSL1
, etc.) the step still hangs:
The hosted runner encountered an error while running your job. (Error Type: Disconnect).
If you have any advice for how to debug this issue, when there are no logs and no way to access the runner when it freezes, that would be greatly appreciated.
Note: This may be a setup-wsl
issue, however a colleague of mine is also encountering similar "freezing runner" issues on macOS+arm, so I'm curious if there is a chance this is a kernel issue and etc.
Platforms affected
- Azure DevOps
- GitHub Actions - Standard Runners
- GitHub Actions - Larger Runners
Runner images affected
- Ubuntu 22.04
- Ubuntu 24.04
- macOS 13
- macOS 13 Arm64
- macOS 14
- macOS 14 Arm64
- macOS 15
- macOS 15 Arm64
- Windows Server 2019
- Windows Server 2022
- Windows Server 2025
Image version and build link
Image: windows-2022
Version: 20250527.1.0
Included Software: https://github.com/actions/runner-images/blob/win22/20250527.1/images/windows/Windows2022-Readme.md
Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20250527.1
See example links above.
Is it regression?
Kind of? (windows-2019 worked fine)
Expected behavior
Either a clear failure (and a log to debug), or a passing run.
Actual behavior
Step appears to pass (it does not halt midway through execution; it finishes, but gets stuck at the end of the process).
Running with python -v
shows that the teardown steps don't occur. But, it is hard to demonstrate this, because when the run gets cancelled, any logs are inaccessible.
Repro steps
See linked workflow file above.