[26.0] Add debug middleware and regression tests for blocked main event loop#22207
Merged
jmchilton merged 4 commits intogalaxyproject:release_26.0from Apr 21, 2026
Merged
Conversation
jmchilton
reviewed
Mar 21, 2026
9003f8b to
a0f3685
Compare
Member
Author
|
This testing is really worth something!: encode/httpx#3707 |
99215a2 to
1b9e4a1
Compare
4 tasks
5b9c0aa to
3be16b0
Compare
Integrate aiocop (https://github.com/Feverup/aiocop), which uses sys.audit hooks to catch specific blocking syscalls (socket.connect, getaddrinfo, subprocess, open, etc.) from inside async tasks and report the exact call site. aiocop is pinned to the event-loop thread explicitly because Galaxy's test harness runs uvicorn in a non-main thread; activation happens on the ASGI lifespan startup event so the first request is monitored. An AiocopMiddleware surfaces captured events per-request via an X-Aiocop-Violations header (count/max-severity/first) so the test interactor can fail requests on high-severity blocking I/O. Integration tests spin up a fresh event loop on a new thread per test module via driver_util.uvicorn_serve, and aiocop has process-global state (audit hook registration, patch_audit_functions side effects, detect_slow_tasks's _detect_slow_tasks_configured guard) that must not be repeated. install_aiocop() is therefore split into a once-per-process block (audit patching, audit hook registration) and a per-Galaxy-instance re-arm (main-thread rebinding, callback clear+register, detect_slow_tasks reset) so each new Galaxy instance actually gets its loop monitored. Enabled by default under GALAXY_TEST_AIOCOP=1 in run_tests.sh; set to 0 to disable.
TabularToolDataField.to_dict() performs os.path.isdir, glob, and os.path.getsize (called twice via get_fingerprint), which blocked the event loop in the async show_field handler and tripped aiocop.
Previously resolved fresh per request via lagom, which re-ran UrlHeadersConfigFactory.from_app_config and reopened the YAML config from disk on the async event loop for every /api/tools/fetch and /api/workflows/* call. All its peer managers with the same dependency shape are already singletons.
CitationsManager.__init__ constructs a DoiCache, which instantiates a beaker CacheManager that creates cache data/lock directories (os.makedirs) and opens lock files. Previously this ran on the event loop for every /api/tools/*/citations etc. request. All its peer managers constructed alongside it in _configure_toolbox are singletons.
3be16b0 to
90fa520
Compare
Member
Author
|
Test failures should all be unrelated, the last 3 commits fix actual bugs |
|
This PR was merged without a "kind/" label, please correct. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add middleware that monitors the asyncio event loop for blocking calls:
stack traces (at ERROR level, sent to Sentry) when it's unresponsive
Server-Timing headers with event loop lag and nginx queue time
Add /api/debug/block and /api/debug/ok diagnostic endpoints to
simulate and observe event loop blocking.
Add test infrastructure to fail API tests on event loop blocking:
How to test the changes?
(Select all options that apply)
License