Closed
Description
Pytest output from the two failing tests:
[ 3680s] =================================== FAILURES ===================================
[ 3680s] _______________________ TestRank.test_pct_max_many_rows ________________________
[ 3680s]
[ 3680s] self = <pandas.tests.frame.test_rank.TestRank object at 0x54f6d7ec>
[ 3680s]
[ 3680s] @pytest.mark.single
[ 3680s] def test_pct_max_many_rows(self):
[ 3680s] # GH 18271
[ 3680s] df = DataFrame({'A': np.arange(2**24 + 1),
[ 3680s] 'B': np.arange(2**24 + 1, 0, -1)})
[ 3680s] > result = df.rank(pct=True).max()
[ 3680s]
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/tests/frame/test_rank.py:317:
[ 3680s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/generic.py:8335: in rank
[ 3680s] return ranker(self)
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/generic.py:8327: in ranker
[ 3680s] pct=pct)
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/algorithms.py:861: in rank
[ 3680s] ascending=ascending, na_option=na_option, pct=pct)
[ 3680s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[ 3680s]
[ 3680s] > ranks = np.empty((n, k), dtype='f8')
[ 3680s] E MemoryError
[ 3680s]
[ 3680s] pandas/_libs/algos_rank_helper.pxi:806: MemoryError
[ 3680s] ____________________________ test_pct_max_many_rows ____________________________
[ 3680s]
[ 3680s] @pytest.mark.single
[ 3680s] def test_pct_max_many_rows():
[ 3680s] # GH 18271
[ 3680s] s = Series(np.arange(2**24 + 1))
[ 3680s] > result = s.rank(pct=True).max()
[ 3680s]
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/tests/series/test_rank.py:505:
[ 3680s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/generic.py:8335: in rank
[ 3680s] return ranker(self)
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/generic.py:8327: in ranker
[ 3680s] pct=pct)
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/algorithms.py:857: in rank
[ 3680s] na_option=na_option, pct=pct)
[ 3680s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[ 3680s]
[ 3680s] > sorted_data = values.take(_as)
[ 3680s] E MemoryError
[ 3680s]
[ 3680s] pandas/_libs/algos_rank_helper.pxi:712: MemoryError
[ 3681s] =============================== warnings summary ===============================
Full build log:
pandas.txt
Activity
TomAugspurger commentedon Feb 20, 2019
Can you look at the blame for that test to see when it was introduced? We could maybe catch a memory error there if it doesn't undo the purpose of the test.
scarabeusiv commentedon Feb 20, 2019
Note that we were not executing the tests until now (which is a bit bummer) but it seems this was not an issue on 0.23 release.
Following commit introduced the test:
4476962
WillAyd commentedon Feb 20, 2019
Can you post output of pd.show_versions()
@jschendel
jschendel commentedon Feb 21, 2019
How much memory does the machine you're running the tests on have? My initial suspicion is that the machine simply doesn't have enough memory to run those specific tests. We actually had a similar issue where these tests would intermittently crash fail on travis. The cause was eventually determined to be a memory error when running the tests in distributed mode, due to another test that happened to be running concurrently pushing beyond the memory limit (hence these currently being marked as single).
The bug that's being tested here was a bit strange in that it only occurred when more than 224 rows were present, so the minimal unit test needed to have 224+1 rows, which is why these tests need as much memory as they do.
A couple options come to mind here:
MemoryError
, as @TomAugspurger suggested@pytest.mark.high_memory
(looks like we only currently use this for one tests) so that users on low memory machines can optionally skip these tests.Not really sure which would be preferable, or if my low memory machine theory is even correct.
All that being said, it's certainly possible that there is a resource leak somewhere, or that memory isn't being used optimally, so investigations along those paths would certainly be welcome.
scarabeusiv commentedon Feb 21, 2019
The machine building it seems to have 2G of ram. I've now expanded it to 4GB and will let you know how it goes within an hour as I get the buildresults :)
scarabeusiv commentedon Feb 21, 2019
Okay can confirm the MemoryError happens also with 4GB ram allocated on the buildbot.
TomAugspurger commentedon Feb 21, 2019
scarabeusiv commentedon Feb 21, 2019
I've created the PR with the markings.
4 remaining items