Skip to content

Addressing multiindex raises TypeError if indices that are rightmost are not present #20951

@MasterAir

Description

@MasterAir

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

A = [100, 100, 200, 200, 300, 300]
B = [10, 10, 20, 21, 31,33]
C = np.random.randint(0, 99, 6)

test_df = pd.DataFrame({'A': A, 'B': B, 'C': C})
test_df = test_df.set_index(['A', 'B'])

print(test_df)
try:
    print(test_df.loc[(100,10)])
except:
    pass

try:
    print(test_df.loc[(0,1)])
except KeyError:
    print('test_df.loc[(0,1)] raises a KeyError')

try:
    print(test_df.loc[(100,1)])
except KeyError as e:
    print(e)
    print('test_df.loc[(100,1)] raises a KeyError')
except TypeError as e:
    print(e)
    print('test+df.loc[(100,1)]) raises a TypeError')

Problem description

If the value is not present in the index, I believe that a KeyError should be raised consistently, so you can write code like.

try:
   df.loc[tuple]
except KeyError:
   # do something if the value isn't present

Expected Output

If df.loc[tuple] does not have a match in the multiindex, a KeyError should be raised.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Activity

jreback

jreback commented on May 4, 2018

@jreback
Contributor

cc @toobaz this closed by your recent MI warning?

toobaz

toobaz commented on May 4, 2018

@toobaz
Member

cc @toobaz this closed by your recent MI warning?

Unfortunately not - my PR should have only affected list(-like)s of keys, not single keys. This is rather related to #19110 and #17024 (and possibly more). Basically, since 100 is found in the index (partial indexing), 1 is looked in the columns rather than in the second level of the index. Which is good (or at least, too late to break it) - except that if 1 is not found in the columns, we should retrieve the original exception.

added
IndexingRelated to indexing on series/frames, not to indexes themselves
on May 8, 2018
MasterAir

MasterAir commented on May 9, 2018

@MasterAir
Author

I'm happy to have a go fixing this behaviour - not sure how successful I'll be or when I'll have time. Is the expected behaviour that I've specified in the issue report correct?

Should missing keys always raise a KeyError?

toobaz

toobaz commented on May 9, 2018

@toobaz
Member

I'm happy to have a go fixing this behaviour - not sure how successful I'll be or when I'll have time.

The basic idea (real code is more complicated) is to replace something like

try:
    # look for tuple in index
except:
    try:
        # look for first element in index, second element in columns
    except Exception as exc:
        raise exc

with something like:

try:
    # look for tuple in index
except Exception as exc:
    try:
        # look for first element in index, second element in columns
    except:
        raise exc

Should missing keys always raise a KeyError?

pd.Series(index=list('abc')).loc[1] raises TypeError, and (although I don't like it,) it is unrelated to the present issue. But yes, print(test_df.loc[(100,1)]) above should result in a KeyError.

toobaz

toobaz commented on Jun 20, 2018

@toobaz
Member

The basic idea (real code is more complicated) is to replace something like

It's actually not so simple, because we still want to raise the current error when the index is not a MultiIndex (and maybe even when it is, but the key is not a plausible MultiIndex key/indexer).

phofl

phofl commented on Nov 10, 2020

@phofl
Member

This works now, the last statement raises

Traceback (most recent call last):
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/base.py", line 3028, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.2/scratches/scratch_4.py", line 69, in <module>
    print(test_df.loc[(100,1)])
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 881, in __getitem__
    return self._getitem_tuple(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1052, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 823, in _getitem_lowerdim
    return getattr(section, self.name)[new_key]
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 881, in __getitem__
    return self._getitem_tuple(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1052, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 799, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1116, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexing.py", line 1065, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/home/developer/PycharmProjects/pandas/pandas/core/generic.py", line 3652, in xs
    return self[key]
  File "/home/developer/PycharmProjects/pandas/pandas/core/frame.py", line 2968, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/developer/PycharmProjects/pandas/pandas/core/indexes/base.py", line 3030, in get_loc
    raise KeyError(key) from err
KeyError: 1

Process finished with exit code 1
lklamt

lklamt commented on Feb 15, 2021

@lklamt

After doing some testing, I agree, the concrete error above seems to be solved in pandas version 1.2.2. Can the Issue be closed?

added this to the 1.3 milestone on May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndexNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@toobaz@MasterAir@gfyoung@mroeschke

      Issue actions

        Addressing multiindex raises TypeError if indices that are rightmost are not present · Issue #20951 · pandas-dev/pandas