Description
From time to time I get an error like this on CI:
FAILED test/test_base.py::TestBase::test_with_rw_remote_and_rw_repo - git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git ls-remote daemon_origin
stderr: 'fatal: unable to connect to localhost:
localhost[0: ::1]: errno=Connection refused
localhost[1: 127.0.0.1]: errno=Connection refused
'
I'm not sure why this happens, but I see it every few days or so, on days I'm pushing lots of commits (more precisely: on days I run git push
many times, since CI is only running on the tip of the pushed branch). It happens both in my fork, in PRs here on this upstream repository (as in #1675 detailed above), and in pushes to this upstream repository (as in d1c1f31). Rerunning the CI check gets rid of it; that is, it doesn't recur. I don't think I've had this locally, but I don't run the tests locally as much as on CI.
Searching reveals that something like this, perhaps the exact thing, happened in #1119, so this may be known. I'm not sure if its known still to be occasionally occurring, or if there is anything that can reasonably be done to make it happen even less often or not at all. (If it's not worth having an issue on this, then I don't mind this simply being closed.)
Activity
EliahKagan commentedon Sep 26, 2023
As a brief update that I hope to flesh out more at some point: I suspect this is actually related to the problem that
HIDE_WINDOWS_FREEZE_ERRORS
is about on native Windows systems. The same tests, or a least one of them, seem affected. I suspect what's happening is thatgit-daemon
is occasionally unresponsive on any platform (though it seems to be a CI issue, as I tried running this test 10,000 times on my Ubuntu system today and everything was fine), but that on some Windows systems the connection does not time out for much longer. This hunch, which could very well be wrong, is based on a wispy recollection of other issues on some Windows systems with network connections to unresponsive servers blocking for an extended time. I don't remember the details. To be clear, this is not something that Windows users would expect to experience regularly; I believe it's something specific that I am just not fully recalling.EliahKagan commentedon Jan 26, 2024
EliahKagan commentedon Jan 27, 2024
ruff-format
#1865EliahKagan commentedon Aug 16, 2024
I think the way to proceed with this is to modify the one affected test so that, when it fails in this specific way, it retries several times. Since as noted above this situation may already be a cause of extended blocking on Windows (where the test is disabled by default and not currently run on CI), retrying should probably only be done on non-Windows systems.
This should achieve at least one of two things:
Byron commentedon Aug 16, 2024
Sounds good to me, thank you!
EliahKagan commentedon Jan 4, 2025
I have not yet done #1676 (comment). But when I saw the most recent occurrence of this in #1989, I was reminded of the git-daemon connection problem that had affected gitoxide, discussed in GitoxideLabs/gitoxide#1726, described in GitoxideLabs/gitoxide#1726 (comment), and fixed in GitoxideLabs/gitoxide#1731. Could this somehow also be related to the problem of connections that have not been properly closed?
Byron commentedon Jan 5, 2025
That is a great point!
GitPython definitely does something similar, and it's likely they run into the same problems. However, the issue seems to happen here more often than in
gitoxide
which (fortunately) is back to rock-solid CI runs that seemingly never fail outside of systematic failures which would affect everyone.GitPython also uses the Git binary much more, which might have additional side-effects and latencies that aren't present in
gitoxide
tests.