Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
$ ipython
Python 3.11.8 (main, Mar 19 2024, 17:46:15) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.22.2 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas
In [2]: import collections
In [3]: MyNamedTuple = collections.namedtuple("MyNamedTuple", "id sub_id")
In [4]: first = MyNamedTuple('identity','1234')
In [5]: idx = pandas.Index([('identity','1234')])
In [6]: idx
Out[6]:
MultiIndex([('identity', '1234')],
)
In [7]: idx2 = idx.to_flat_index()
In [8]: idx2
Out[8]: Index([('identity', '1234')], dtype='object')
In [9]: first in idx
Out[9]: True
In [10]: first in idx2
Out[10]: False
In [11]: first in idx2.to_list()
Out[11]: True
In [12]: first == idx2[0]
Out[12]: True
In [13]: pandas.__version__
Out[13]: '2.2.1'
In [14]: idx.get_loc(first)
Out[14]: 0
In [15]: idx2.get_loc(first)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
3804 try:
-> 3805 return self._engine.get_loc(casted_key)
3806 except KeyError as err:
File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: MyNamedTuple(id='identity', sub_id='1234')
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[15], line 1
----> 1 idx2.get_loc(first)
File ~/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
3807 if isinstance(casted_key, slice) or (
3808 isinstance(casted_key, abc.Iterable)
3809 and any(isinstance(x, slice) for x in casted_key)
3810 ):
3811 raise InvalidIndexError(key)
-> 3812 raise KeyError(key) from err
3813 except TypeError:
3814 # If we have a listlike key, _check_indexing_error will raise
3815 # InvalidIndexError. Otherwise we fall through and re-raise
3816 # the TypeError.
3817 self._check_indexing_error(key)
KeyError: MyNamedTuple(id='identity', sub_id='1234')
In [16]:
Issue Description
Upgraded from pandas 1.2.5 to pandas 1.3.5 and noticed that I was unable to reference columns in a dataframe with column labels that were tuples via a NamedTuple, i.e. KeyError. Grabbed the latest pandas and reduced the issue down to pandas.Index.get_loc - though it works in the case where I leave the Index as a MultiIndex.
Note: I have seen the code work in about 25% of cases, so if you see it succeed please try again
Expected Behavior
NamedTuples should match regular tuples as they do elsewhere in python (as illustrated by the fact that they match when one does idx.to_list()
)
Installed Versions
In [16]: pandas.show_versions()
/home/russellm/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit : bdc79c1
python : 3.11.8.final.0
python-bits : 64
OS : Linux
OS-release : 6.5.0-21-generic
Version : #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.1
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 65.5.0
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
In [17]: