Skip to content

test_pct_max_many_rows fails on intel 32bit with memory error #25384

Closed
@scarabeusiv

Description

@scarabeusiv
Contributor

Pytest output from the two failing tests:

[ 3680s] =================================== FAILURES ===================================
[ 3680s] _______________________ TestRank.test_pct_max_many_rows ________________________
[ 3680s] 
[ 3680s] self = <pandas.tests.frame.test_rank.TestRank object at 0x54f6d7ec>
[ 3680s] 
[ 3680s]     @pytest.mark.single
[ 3680s]     def test_pct_max_many_rows(self):
[ 3680s]         # GH 18271
[ 3680s]         df = DataFrame({'A': np.arange(2**24 + 1),
[ 3680s]                         'B': np.arange(2**24 + 1, 0, -1)})
[ 3680s] >       result = df.rank(pct=True).max()
[ 3680s] 
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/tests/frame/test_rank.py:317: 
[ 3680s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/generic.py:8335: in rank
[ 3680s]     return ranker(self)
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/generic.py:8327: in ranker
[ 3680s]     pct=pct)
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/algorithms.py:861: in rank
[ 3680s]     ascending=ascending, na_option=na_option, pct=pct)
[ 3680s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[ 3680s] 
[ 3680s] >   ranks = np.empty((n, k), dtype='f8')
[ 3680s] E   MemoryError
[ 3680s] 
[ 3680s] pandas/_libs/algos_rank_helper.pxi:806: MemoryError
[ 3680s] ____________________________ test_pct_max_many_rows ____________________________
[ 3680s] 
[ 3680s]     @pytest.mark.single
[ 3680s]     def test_pct_max_many_rows():
[ 3680s]             # GH 18271
[ 3680s]             s = Series(np.arange(2**24 + 1))
[ 3680s] >           result = s.rank(pct=True).max()
[ 3680s] 
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/tests/series/test_rank.py:505: 
[ 3680s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/generic.py:8335: in rank
[ 3680s]     return ranker(self)
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/generic.py:8327: in ranker
[ 3680s]     pct=pct)
[ 3680s] ../../BUILDROOT/python-pandas-0.24.1-13.1.i386/usr/lib/python2.7/site-packages/pandas/core/algorithms.py:857: in rank
[ 3680s]     na_option=na_option, pct=pct)
[ 3680s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[ 3680s] 
[ 3680s] >   sorted_data = values.take(_as)
[ 3680s] E   MemoryError
[ 3680s] 
[ 3680s] pandas/_libs/algos_rank_helper.pxi:712: MemoryError
[ 3681s] =============================== warnings summary ===============================

Full build log:
pandas.txt

Activity

TomAugspurger

TomAugspurger commented on Feb 20, 2019

@TomAugspurger
Contributor

Can you look at the blame for that test to see when it was introduced? We could maybe catch a memory error there if it doesn't undo the purpose of the test.

added
Testingpandas testing functions or related to the test suite
on Feb 20, 2019
scarabeusiv

scarabeusiv commented on Feb 20, 2019

@scarabeusiv
ContributorAuthor

Note that we were not executing the tests until now (which is a bit bummer) but it seems this was not an issue on 0.23 release.

Following commit introduced the test:
4476962

WillAyd

WillAyd commented on Feb 20, 2019

@WillAyd
Member

Can you post output of pd.show_versions()

@jschendel

jschendel

jschendel commented on Feb 21, 2019

@jschendel
Member

How much memory does the machine you're running the tests on have? My initial suspicion is that the machine simply doesn't have enough memory to run those specific tests. We actually had a similar issue where these tests would intermittently crash fail on travis. The cause was eventually determined to be a memory error when running the tests in distributed mode, due to another test that happened to be running concurrently pushing beyond the memory limit (hence these currently being marked as single).

The bug that's being tested here was a bit strange in that it only occurred when more than 224 rows were present, so the minimal unit test needed to have 224+1 rows, which is why these tests need as much memory as they do.

A couple options come to mind here:

  • Adjust the test to try catching MemoryError, as @TomAugspurger suggested
  • Additionally mark the test with @pytest.mark.high_memory (looks like we only currently use this for one tests) so that users on low memory machines can optionally skip these tests.

Not really sure which would be preferable, or if my low memory machine theory is even correct.

All that being said, it's certainly possible that there is a resource leak somewhere, or that memory isn't being used optimally, so investigations along those paths would certainly be welcome.

scarabeusiv

scarabeusiv commented on Feb 21, 2019

@scarabeusiv
ContributorAuthor

The machine building it seems to have 2G of ram. I've now expanded it to 4GB and will let you know how it goes within an hour as I get the buildresults :)

scarabeusiv

scarabeusiv commented on Feb 21, 2019

@scarabeusiv
ContributorAuthor

Okay can confirm the MemoryError happens also with 4GB ram allocated on the buildbot.

TomAugspurger

TomAugspurger commented on Feb 21, 2019

@TomAugspurger
Contributor
scarabeusiv

scarabeusiv commented on Feb 21, 2019

@scarabeusiv
ContributorAuthor

I've created the PR with the markings.

added this to the 0.25.0 milestone on Feb 22, 2019

4 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs InfoClarification about behavior needed to assess issueTestingpandas testing functions or related to the test suite

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @WillAyd@jreback@scarabeusiv@TomAugspurger@jschendel

      Issue actions

        test_pct_max_many_rows fails on intel 32bit with memory error · Issue #25384 · pandas-dev/pandas