Skip to content

Occasional "fatal: unable to connect to localhost" on CI #1676

Open
@EliahKagan

Description

@EliahKagan
Member

From time to time I get an error like this on CI:

FAILED test/test_base.py::TestBase::test_with_rw_remote_and_rw_repo - git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git ls-remote daemon_origin
  stderr: 'fatal: unable to connect to localhost:
localhost[0: ::1]: errno=Connection refused
localhost[1: 127.0.0.1]: errno=Connection refused
'

I'm not sure why this happens, but I see it every few days or so, on days I'm pushing lots of commits (more precisely: on days I run git push many times, since CI is only running on the tip of the pushed branch). It happens both in my fork, in PRs here on this upstream repository (as in #1675 detailed above), and in pushes to this upstream repository (as in d1c1f31). Rerunning the CI check gets rid of it; that is, it doesn't recur. I don't think I've had this locally, but I don't run the tests locally as much as on CI.

Searching reveals that something like this, perhaps the exact thing, happened in #1119, so this may be known. I'm not sure if its known still to be occasionally occurring, or if there is anything that can reasonably be done to make it happen even less often or not at all. (If it's not worth having an issue on this, then I don't mind this simply being closed.)

Activity

EliahKagan

EliahKagan commented on Sep 26, 2023

@EliahKagan
MemberAuthor

As a brief update that I hope to flesh out more at some point: I suspect this is actually related to the problem that HIDE_WINDOWS_FREEZE_ERRORS is about on native Windows systems. The same tests, or a least one of them, seem affected. I suspect what's happening is that git-daemon is occasionally unresponsive on any platform (though it seems to be a CI issue, as I tried running this test 10,000 times on my Ubuntu system today and everything was fine), but that on some Windows systems the connection does not time out for much longer. This hunch, which could very well be wrong, is based on a wispy recollection of other issues on some Windows systems with network connections to unresponsive servers blocking for an extended time. I don't remember the details. To be clear, this is not something that Windows users would expect to experience regularly; I believe it's something specific that I am just not fully recalling.

EliahKagan

EliahKagan commented on Jan 26, 2024

@EliahKagan
Author
EliahKagan

EliahKagan commented on Jan 27, 2024

@EliahKagan
Author
EliahKagan

EliahKagan commented on Aug 16, 2024

@EliahKagan
MemberAuthor

I think the way to proceed with this is to modify the one affected test so that, when it fails in this specific way, it retries several times. Since as noted above this situation may already be a cause of extended blocking on Windows (where the test is disabled by default and not currently run on CI), retrying should probably only be done on non-Windows systems.

This should achieve at least one of two things:

  • If the failures are random, the problem is effectively fixed, because failing on each try will be no more common than other unusual kinds of CI failures that tend not to persist (e.g. failing to check out the repository in the first place).
  • If the failures are not random, such that retrying often also fails, then we have learned something the problem that that may help to figure out the cause.
Byron

Byron commented on Aug 16, 2024

@Byron
Member

Sounds good to me, thank you!

EliahKagan

EliahKagan commented on Jan 4, 2025

@EliahKagan
MemberAuthor

I have not yet done #1676 (comment). But when I saw the most recent occurrence of this in #1989, I was reminded of the git-daemon connection problem that had affected gitoxide, discussed in GitoxideLabs/gitoxide#1726, described in GitoxideLabs/gitoxide#1726 (comment), and fixed in GitoxideLabs/gitoxide#1731. Could this somehow also be related to the problem of connections that have not been properly closed?

Byron

Byron commented on Jan 5, 2025

@Byron
Member

That is a great point!

GitPython definitely does something similar, and it's likely they run into the same problems. However, the issue seems to happen here more often than in gitoxide which (fortunately) is back to rock-solid CI runs that seemingly never fail outside of systematic failures which would affect everyone.

GitPython also uses the Git binary much more, which might have additional side-effects and latencies that aren't present in gitoxide tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @Byron@EliahKagan

        Issue actions

          Occasional "fatal: unable to connect to localhost" on CI · Issue #1676 · gitpython-developers/GitPython