Skip to content

pytest-xdist server side timeout #1550

@limaoscarjuliet

Description

@limaoscarjuliet

NOTE: This is not about timeout for test code itself (pytest-timeout works well here), this is about need for timeout in pytest-xdist.

First, let me say big thank you for pytest and pytest-xdist. We use it to run ~400 Docker containers on ~10 servers on AWS. It works wonders!

There are scenarios where pytest-xdist does not detect remote session crash or disconnect and as such will wait for results forever.

Today's xdist code detects session crash via EOF on the SSH session. When network connection is torn down, server marks the worker as dead, and re-adds it. All good.

But... consider a scenario where the SSH is not torn down:

  1. Run N tests on multiple remote machines with pytest-xdist,
  2. Tests spawn a python process on remote machine via SSH
  3. We run in boxed mode, so this process forks to run actual test code
  4. Process supply item.keywords on funcarg requests and metafunc as well #2 gets killed or crashes
  5. SSH session stays up because process not executable on Python 2.7 #3 inherited at least one stdin/out/err from the process supply item.keywords on funcarg requests and metafunc as well #2 (standard SSH behavior).

In this case, the server side xdist thinks the session is up and is waiting for the results for really, really long time ;-)

And yes, #2 does not crash normally. In our case it was oom killed quite persistently. All it takes is 1 oom kill for tens of thousands of tests and entire batch is ruined.

Please let me know if I can provide more info on this issue.

[root@nsth-c10 nsth] #.python --version
Python 2.7.10
[root@nsth-c10 nsth] #.py.test --version
This is pytest version 2.8.0, imported from /usr/local/lib/python2.7/site-packages/pytest-2.8.0-py2.7.egg/pytest.pyc
setuptools registered plugins:
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/boxed.pyc
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/looponfail.pyc
pytest-xdist-1.13.1 at /usr/local/lib/python2.7/site-packages/pytest_xdist-1.13.1-py2.7.egg/xdist/plugin.pyc
[root@nsth-c10 nsth] #.

Activity

RonnyPfannschmidt

RonnyPfannschmidt commented on May 5, 2016

@RonnyPfannschmidt
Member

can you report this one on pytest-xdist instead?

limaoscarjuliet

limaoscarjuliet commented on May 5, 2016

@limaoscarjuliet
Author
RonnyPfannschmidt

RonnyPfannschmidt commented on May 6, 2016

@RonnyPfannschmidt
Member

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @RonnyPfannschmidt@limaoscarjuliet

        Issue actions

          pytest-xdist server side timeout · Issue #1550 · pytest-dev/pytest