Skip to content

BUG: NamedTuples do no match tuples in pandas.Index #57922

Closed
@Apteryx0

Description

@Apteryx0

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

$ ipython
Python 3.11.8 (main, Mar 19 2024, 17:46:15) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.22.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas

In [2]: import collections

In [3]: MyNamedTuple = collections.namedtuple("MyNamedTuple", "id sub_id")

In [4]: first = MyNamedTuple('identity','1234')

In [5]: idx = pandas.Index([('identity','1234')])

In [6]: idx
Out[6]: 
MultiIndex([('identity', '1234')],
           )

In [7]: idx2 = idx.to_flat_index()

In [8]: idx2
Out[8]: Index([('identity', '1234')], dtype='object')

In [9]: first in idx
Out[9]: True

In [10]: first in idx2
Out[10]: False

In [11]: first in idx2.to_list()
Out[11]: True

In [12]: first == idx2[0]
Out[12]: True

In [13]: pandas.__version__
Out[13]: '2.2.1'

In [14]: idx.get_loc(first)
Out[14]: 0

In [15]: idx2.get_loc(first)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: MyNamedTuple(id='identity', sub_id='1234')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[15], line 1
----> 1 idx2.get_loc(first)

File ~/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: MyNamedTuple(id='identity', sub_id='1234')

In [16]:

Issue Description

Upgraded from pandas 1.2.5 to pandas 1.3.5 and noticed that I was unable to reference columns in a dataframe with column labels that were tuples via a NamedTuple, i.e. KeyError. Grabbed the latest pandas and reduced the issue down to pandas.Index.get_loc - though it works in the case where I leave the Index as a MultiIndex.

Note: I have seen the code work in about 25% of cases, so if you see it succeed please try again

Expected Behavior

NamedTuples should match regular tuples as they do elsewhere in python (as illustrated by the fact that they match when one does idx.to_list())

Installed Versions

In [16]: pandas.show_versions()
/home/russellm/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : bdc79c1
python : 3.11.8.final.0
python-bits : 64
OS : Linux
OS-release : 6.5.0-21-generic
Version : #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.1
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 65.5.0
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

In [17]:

Metadata

Metadata

Labels

BugIndexingRelated to indexing on series/frames, not to indexes themselves

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions