Description
Code Sample, a copy-pastable example if possible
In [2]: s = pd.DataFrame(-1, index=[1, np.nan, 2,],
...: columns=[3, np.nan, 1])
...:
In [3]: s + s # good
Out[3]:
3.0 NaN 1.0
1.0 -2 -2 -2
NaN -2 -2 -2
2.0 -2 -2 -2
In [4]: s == s # bad
Out[4]:
3.0 NaN 1.0
1.0 True NaN True
NaN True NaN True
2.0 True NaN True
Problem description
While it is true that np.nan != np.nan
, pandas disregards this in indexes (indeed, s.loc[:, np.nan]
works), so it should be coherent.
Expected Output
In [4]: s == s
Out[4]:
3.0 NaN 1.0
1.0 True True True
NaN True True True
2.0 True True True
Output of pd.show_versions()
INSTALLED VERSIONS
commit: b45325e
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-3-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.22.0.dev0+201.gb45325e28.dirty
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
Activity
toobaz commentedon Nov 23, 2017
By the way: it works with
MultiIndex
columns:toobaz commentedon Nov 23, 2017
I spoke too soon:
jreback commentedon Nov 24, 2017
hmm this should work, we already use
array_equivalent
to compare (insideIndex.equals
).toobaz commentedon Nov 24, 2017
More general question: is it a desired feature or a limitation that equality works only on objects with similarly ordered
index
/columns
? E.g. compareSeries(index=[1,2]) + Series(index=[2,1])
(works) withSeries(index=[1,2]) == Series(index=[2,1])
(ValueError
): the latter could in principle get an indexer, find out the index actually contains the same elements, and hence compare values (clearly at a cost, which however could be easily avoided in those cases in which the index is indeed the same - that is, the change wouldn't hinder performance for current correct use).jreback commentedon Nov 24, 2017
see long discussion here: #1134
toobaz commentedon Nov 24, 2017
Interesting, but my understanding is that it does not consider the specific issue of having the same labels but in a different order. I understand the reason not to support comparison between different indexes is to avoid
NaN
s (or dropping elements/rows). What I suggest instead is just to check if labels are equal after sorting.jreback commentedon Nov 25, 2017
how could different orderings be considered equal?
toobaz commentedon Nov 25, 2017
My idea would be something like
jreback commentedon Nov 25, 2017
i am asking why you think this is a good idea to ignore ordering in an ordered array
14 remaining items
jorisvandenbossche commentedon Jan 16, 2018
I am not sure this was the reason. Because if comparison operations would align, you would 1) align introducing NaNs in the values and 2) compare and where there are NaNs you just get
False
(just as you would now get with already aligned objects that contains NaNs).So even if comparisons do alignment you can still get a normal functioning boolean result.
I think one of the reasons to not let the comparisons align was 1) make series behaviour consistent with dataframe (but of course, we could also have changed the dataframe behaviour to align as well) and 2) people liked the error as a sanity check (as often, when doing a comparison you want to use it for boolean indexing, and then if you get alignment, that might give unexpected results). One example use case that Wes gave:
s1[1:] == s2[:1]
.toobaz commentedon Jan 16, 2018
Good point: comparison of NaNs is well defined.
Exactly
True. My idea of introducing NaNs would have provided this sanity check... but it's just too inconsistent. And while I would rather not have this sanity check, changing it now would be too disruptive.
I still think we could just allow for different order of indexes, in unique indexes with same elements, not to matter.
BUG: Fix initialization of DataFrame from dict with NaN as key
BUG: Fix initialization of DataFrame from dict with NaN as key
BUG: Fix initialization of DataFrame from dict with NaN as key
BUG: Fix initialization of DataFrame from dict with NaN as key
TST: removed workaround for pandas-dev#18455
BUG: Fix initialization of DataFrame from dict with NaN as key
TST: removed workaround for pandas-dev#18455
BUG: Fix initialization of DataFrame from dict with NaN as key
TST: removed workaround for pandas-dev#18455
BUG: Fix initialization of DataFrame from dict with NaN as key
TST: removed workaround for pandas-dev#18455
BUG: Fix initialization of DataFrame from dict with NaN as key (#18600)