Closed
Description
Series[not-categorical] > CategoricalIndex is inconsistent with the reversed operation. Which one is canonical?
ser = pd.Series([1, 2, 3])
idx = pd.CategoricalIndex(['A', 'B', 'A'])
>>> ser > idx
0 False
1 False
2 False
>>> idx < ser
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/indexes/category.py", line 752, in _evaluate_compare
return getattr(self.values, opname)(other)
File "pandas/core/arrays/categorical.py", line 56, in f
raise TypeError("Unordered Categoricals can only compare "
TypeError: Unordered Categoricals can only compare equality or not
>>> ser > idx.values
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pandas/core/ops.py", line 819, in wrapper
.format(op=op, typ=self.dtype))
TypeError: Cannot compare a Categorical for op <built-in function gt> with Series of dtype int64.
If you want to compare values, use 'series <op> np.asarray(other)'.
I'm guessing the right thing to do is to a) have Series[categorical].__op__
wrap CategoricalIndex.__op__
, and b) have Series[non-categorical]
to dispatch to reversed-op for is_categorical_dtype(other)
, want to confirm before making a PR.
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
jbrockmendel commentedon Feb 3, 2018
A case derived from
tests.categorical.test_operators.TestCategoricalOps.test_comparisons
, where it has the commentjreback commentedon Feb 6, 2018
there are some issues about this already. see if you can find them.
jbrockmendel commentedon Feb 6, 2018
Closest issues with categorical label are #18050, #8995, neither of which seem to differentiate between the behavior of
Categorical.__op__
vsCategoricalIndex.__op__
vsSeries[category].__op__
@TomAugspurger any thoughts on the One True Implementation for these operations?
TomAugspurger commentedon Feb 6, 2018
No strong thoughts, but I suppose a decent heuristic is "what would the result be if NumPy had a categorical dtype?" In that case, let's just do the same as
Series[int].op(Index[int])
andSeries[int].op(ndarray[int])
, etc. SoDoes that make sense?
jbrockmendel commentedon Feb 6, 2018
I should clarify. The question isn't what the return type should be, but which operations are allowed at all.
The easy (i.e. pretty obviously wrong ATM) part is cases where
x < y
is allowed buty > x
raises:The less obvious part is what casting rules are allowed. ATM
CategoricalIndex.__op__
is much more forgiving about input dtypes and will castother
before dispatching toCategorical.__op__
.The goal here is to have one implementation of each of these methods (ideally in Categorical) and then have the other classes wrap that instead of re-implementing and risking having the logic diverge.
jreback commentedon Feb 10, 2018
Categorical is prob a bit strict about these checks ATM. Categorial and CI should behave exactly the same.
These should all (Series, Cat, CI) should all respect ordered (IOW they have to be the same). for comparison purposes.
@jschendel
jbrockmendel commentedon Feb 20, 2018
It looks like
Series.__cmp__(CategoricalIndex)
effectively callsother = np.asarray(other)
on the CategoricalIndex, which I expect isn't the desired behavior.jbrockmendel commentedon Mar 10, 2018
One more:
What should
(DatetimeIndex|TimedeltaIndex|PeriodIndex).__(add|sub)__(Categorical|CategoricalIndex)
do? I think right now they returnNotImplemented
which seems reasonable, but they do it in a catch-all block instead of intentionally and I don't think the case is tested.@jschendel are you the appropriate person to ping on this?
jschendel commentedon Mar 10, 2018
I don't think I have any type of definitive say on this; mostly worked on categorical vs. categorical comparisons.
I agree that this probably shouldn't be supported. I suppose you could perform the operation on the underlying category values, but it seems like you'd run into ambiguous corner cases pretty quickly. For example, would you actually want to allow add/sub if the categories were ordered with a non-standard ordering, e.g.
Timestamp('2017-01-01') > Timestamp('2018-01-01')
? Not sure why anyone would ever do that, but could conceivably happen.Also doesn't look like add/sub at the scalar level is implemented for
Timestamp|Timedelta|Period
vs.Categorical|CategoricalIndex
, which I would expect to be implemented if we were to support the same operations on the index equivalent.14 remaining items