Closed
Description
I am not sure if this is on purpose, but as far as I know, or
operation should be symmetric ( I mean a | b == b | a
); not in pandas:
>>> a = pd.Series( [True] )
>>> b = pd.Series( [np.nan] )
>>> a | b
0 True
dtype: bool
>>> b | a
0 False
dtype: bool
is there any reason pandas is treating NaN
values asymmetrically in here?
also, this is a weird outcome as well:
>>> ( b == True ) | ( a == True )
0 True
dtype: bool
and this:
>>> b.astype( bool )
0 True
dtype: bool
this even has internal inconsistency; so before in b | a
, NaN
was treated as False
, but now here it is True
;
Activity
jreback commentedon Mar 3, 2014
this could be a bug, but you boolean comparison of
True == np.nan
is very odd in the first placeshould the result be
np.nan
because its missing,False
becauseTrue
is equivalent ofnp.nan
?See the long and compliacted tests here:
https://github.com/pydata/pandas/blob/master/pandas/tests/test_series.py#L3142
if you see a bug, pls submit a PR
will mark it as a bug, but come back to us with your thoughts
behzadnouri commentedon Mar 3, 2014
@jreback there are 3 issues above:
a | b
is not equal tob | a
;b.astype( bool )
convertsNaN
values toTrue
; this is worst of 3 possible values, namelyNaN
,False
,True
; and also not consistent with [3] belowb == True
is not consistent with [2] in above;this is really bad; because outcome of
|
should not depend on order; especially whenNaN
values are implicitly generated:jreback commentedon Mar 3, 2014
@behzadnouri ok.....pls make a PR to fix! Keep in mind their may be some reasons buried for this; e.g. put in tests for the 'correct' behavior and see what breaks.
astype
actually should fail IMHO. it is really unclear what to do withnp.nan
, you normallyb.fillna(False).astype(bool)
.jreback commentedon Mar 3, 2014
@hayd pls chime in here when you can....IIRC you and I discussed these types of things at length
hayd commentedon Mar 3, 2014
some discussion was here: #4953.
A lot of this is as we're fighting "strange" truthiness for numpy's and python's (!) nan, so things like astype we can't/shouldn't fix... kinda a "gotcha".
IMO asymmetry is a bug but only because we're forcing a boolean return value (the reason this happens is cos in numpy and in general
a|b
is not symmetric). Now I think about it, I do remember this coming up...seth-p commentedon Sep 28, 2014
My 2c: I would separate this out to two issues:
Issue 1: Any boolean operations should behave the same as if
.astype(bool)
were first invoked on the structure(s). So, for example,a | b
should return the same asa.astype(bool) | b.astype(bool)
, andb == True
should return the same asb.astype(bool) == True
. If this is the case, this should address the commutativity problem (as I'm assuming the boolean operations on boolean structures are commutative).Issue 2: Should
.astype(bool)
convert aNaN
toTrue
orFalse
? Currently it converts it toTrue
, which is consistent with the fact thatbool(NaN)
isTrue
, as defined by numpy. Personally, I would have definedbool(NaN)
asFalse
, and this may be a situation where we want the Pandas behavior (of.astype(bool)
) to differ from numpy's. But this should be considered an API change, not a bug fix. I also just raised this issue on https://groups.google.com/forum/#!topic/pydata/U7gQyI_gLMc.At any rate, I view these as two separate (but obviously related) issues, which I think are being slightly conflated in this issue (and #8151). I would suggest first addressing Issue 1, which seems to me clearly to be a bug -- keeping the existing behavior of treating
NaN
asTrue
(so in the first example above,a | b
,b | a
, andb == True
would all returnSeries([True])
); and then separately, after further discussion and reflection, consider changing the behavior of.astype(bool)
(and related boolean operations) to convertNaN
toFalse
.behzadnouri commentedon Sep 28, 2014
@seth-p
as long as the logical operations are bound to return
dtype='bool'
. that is exactly what i am testing herethat would not be achievable. for example,
2 != True
and2 != False
, so you cannot type cast a series unless it is only zeros and ones.strongly discourage that, because:
-1- it is not only numpy, but as you also mentioned, and much more importantly that is how python works:
so, it can easily break things as simple as
ts.map(bool) == ts.astype(bool)
.-2- the gain of breaking that consistency, if any, is less than minimal. because, if a user wants to deviate from python's convention that is more than easy:
ts.fillna(False)
, orts.fillna(0)
.seth-p commentedon Sep 28, 2014
D'oh! @behzadnouri, of course you're right about
==
and!=
. I guess I should have said that any logical operations that expect booleans should produce the same results as for.astype(bool)
.Re:
.astype(bool)
convertingNaN
toTrue
, fair enough. And you're right, of course, that what I'm suggesting is simply the same as.fillna(False).astype(bool)
.[-]`NaN` values impact binary or operations asymmetrically[/-][+]NaN values impact binary or operations asymmetrically[/+]1 remaining item
h-vetinari commentedon Dec 6, 2017
Tried to find the most current open issue on this - the others are closed or merged, but this is open even though being milestoned since 0.14?
I asked a question about this on stackexchange, and someone took the time to identify the apparent source of the issue (as well as a work-around): https://stackoverflow.com/a/47664356/2965879
The main issue seems that in pandas.core.ops.wrapper (https://github.com/pandas-dev/pandas/blob/master/pandas/core/ops.py#L928), after aligning the series, only
other
is filled (transforming NaN to False), butself
isn't, which introduces this symmetry break.TomAugspurger commentedon Dec 2, 2019
I don't expect the behavior of
np.nan
in boolean operations to change, but #29842 is implementing a new BooleanArray extension type that I think addressees most things. Let us know if there are additional problems in new issues.