Description
In https://github.com/digitalfabrik/integreat-cms we're currently facing an issue with saving objects that contain links that should be checked. Users "frequently" run into response timeouts (after 1800s). At the same time we see quite a lot of long running linkcheck queries running in the database (>30s). This only happens when tests_running == False
in the listeners.py
. The long running queries disappear as soon as we set tests_running = True
.
We concluded that we're facing some kind of issue with threading. As Apache2 mod_wsgi does have a GIL deadlock detection which is not jumping into action, we think that there must be another cause. Our next suspicion is a potentially shared database cursor between the WSGI process and the linkcheck worker thread.
The queries that are stuck are these:
SELECT "linkcheck_link"."id", "linkcheck_link"."content_type_id", "linkcheck_link"."object_id", "linkcheck_link"."field", "linkcheck_link"."url_id", "linkcheck_link"."text", "linkcheck_link"."ignore" FROM "linkcheck_link" WHERE ("linkcheck_link"."content_type_id" = 9 AND "linkcheck_link"."object_id" = 175527)
The query seems to originate in https://github.com/DjangoAdminHackers/django-linkcheck/blob/master/linkcheck/listeners.py#L80
https://www.psycopg.org/psycopg3/docs/advanced/async.html states the following:
Cursor objects are not thread-safe, and are not designed to be used by several threads at the same time.
We were not yet able to positively validate that the linkcheck thread uses its own dedicated database cursor. And a resource contention with the database cursor would fully explain the issues we're facing.