Closed
Description
https://travis-ci.org/pandas-dev/pandas/jobs/267760269
Master and some PRs (strangely not all) are all failing on the same test, but only on Mac:
=================================== FAILURES ===================================
_____________________ TestCategoricalIndex.test_reindexing _____________________
[gw1] darwin -- Python 3.5.4 /Users/travis/miniconda3/envs/pandas/bin/python
self = <pandas.tests.indexes.test_category.TestCategoricalIndex object at 0x111d5e860>
def test_reindexing(self):
ci = self.create_index()
oidx = Index(np.array(ci))
for n in [1, 2, 5, len(ci)]:
finder = oidx[np.random.randint(0, len(ci), size=n)]
expected = oidx.get_indexer_non_unique(finder)[0]
actual = ci.get_indexer(finder)
> tm.assert_numpy_array_equal(expected, actual)
pandas/tests/indexes/test_category.py:389:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1174: in assert_numpy_array_equal
_raise(left, right, err_msg)
pandas/util/testing.py:1157: in _raise
.format(obj=obj), left.shape, right.shape)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'numpy array', message = 'numpy array shapes are different', left = (14,)
right = (6,), diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{obj} are different
{message}
[left]: {left}
[right]: {right}""".format(obj=obj, message=message, left=left, right=right)
if diff is not None:
msg += "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: numpy array are different
E
E numpy array shapes are different
E [left]: (14,)
E [right]: (6,)
pandas/util/testing.py:1105: AssertionError
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
No branches or pull requests
Activity
MAINT: Clean up docs in pandas/errors/__init__.py
gfyoung commentedon Aug 24, 2017
Marking as unreliable given that only some PR's are failing on this test.
gfyoung commentedon Aug 24, 2017
I think the
np.random.randint
is the problem.jorisvandenbossche commentedon Aug 24, 2017
But whathever result of randint that I can think off, the
oidx.get_indexer_non_unique(finder)[0]
andci.get_indexer(finder)
should still behave the same ..gfyoung commentedon Aug 24, 2017
I suppose, though I don't know how else you get such a difference of 14 and 6...
TomAugspurger commentedon Aug 24, 2017
FYI, pytest-repeat is helpful for debugging these.
jorisvandenbossche commentedon Aug 24, 2017
Indeed good tip :-)
So this happens when the random generation results in exactly the same index as the original one:
But I would say this is then rather a bug in
get_indexer
(and not in the test) ?(still a bit strange that it turned up in so many builds now, and only on mac, as for this failed in 3 out of 1000 runs)
jorisvandenbossche commentedon Aug 24, 2017
So reproducible example:
Compared to normal indices,
get_indexer
is able to handle non-unique index for Categoricals (for exampleci.get_indexer(['a', 'b', 'c'])
in [5]). However, insideget_indexer
a 'fast path' was added for when the passed index equals the calling one. Therefore in such a case, it does not duplicate the duplicates (as it does in [5] and [8]). This contrast with howget_indexer_non_unique
works ([7]).The 'fast path' was added in the IntervalIndex PR, so this also seems a change compared to 0.19:
(but not sure what actual consequences in user code could be)
jorisvandenbossche commentedon Aug 24, 2017
A possible fix: add to the fastpath a check for uniqueness of the index here:
pandas/pandas/core/indexes/category.py
Lines 486 to 491 in 96f92eb
BUG: Respect dups in reindexing CategoricalIndex
15 remaining items