Skip to content

DataFrame with PeriodIndex causes KeyError on get_value #15268

@devanl

Description

@devanl

Code Sample, a copy-pastable example if possible

from pandas import to_datetime, period_range, DataFrame
import pandas as pd

print(pd.__version__)

start_of_time = to_datetime('2016-10-17 01:16:39.133000')
end_of_time = to_datetime('2017-01-04 23:58:37.905000')
avs_date_range = period_range(start_of_time, end_of_time, freq='D')

bins = DataFrame(dict(foo=[0] * len(avs_date_range), bar=[0] * len(avs_date_range)),
                 index=avs_date_range)

current = range(10)

for idx, bin in bins.iterrows():
    for i in range(6):
        bins.set_value(idx, 'foo', bin['foo'] + 1)

    f_count = bins.get_value(idx, 'foo')
    bins.set_value(idx, 'bar', len(current) - f_count)

print(bins)

Problem description

If I comment out the setting of index this works as expected with PeriodIndex defined this creates KeyError.

Output of pd.show_versions()

0.19.1 Traceback (most recent call last): File "pandas\index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas\index.c:4289) File "pandas\src\hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8534) TypeError: an integer is required

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 5.0.1\helpers\pydev\pydevd.py", line 2403, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm 5.0.1\helpers\pydev\pydevd.py", line 1794, in run
launch(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 5.0.1\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/dlippman/src/ETDataView/test.py", line 19, in
f_count = bins.get_value(idx, 'foo')
File "C:\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\pandas\core\frame.py", line 1900, in get_value
return engine.get_value(series.get_values(), index)
File "pandas\index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas\index.c:3567)
File "pandas\index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas\index.c:3250)
File "pandas\index.pyx", line 163, in pandas.index.IndexEngine.get_loc (pandas\index.c:4373)
KeyError: Period('2016-10-17', 'D')

Activity

jreback

jreback commented on Jan 30, 2017

@jreback
Contributor

.set_value is a fairly raw low-level non-public interface. Use .loc.

Futher what you are doing is quite non-performant, iterating over the rows is not recommended.

In [15]: bins
Out[15]: 
            bar  foo
2016-10-17    0    0
2016-10-18    0    0
2016-10-19    0    0
2016-10-20    0    0
2016-10-21    0    0
...         ...  ...
2016-12-31    0    0
2017-01-01    0    0
2017-01-02    0    0
2017-01-03    0    0
2017-01-04    0    0

[80 rows x 2 columns]

In [16]: bins.loc[bins.index[-1], 'foo'] = 1

In [17]: bins
Out[17]: 
            bar  foo
2016-10-17    0    0
2016-10-18    0    0
2016-10-19    0    0
2016-10-20    0    0
2016-10-21    0    0
...         ...  ...
2016-12-31    0    0
2017-01-01    0    0
2017-01-02    0    0
2017-01-03    0    0
2017-01-04    0    1

[80 rows x 2 columns]

added
IndexingRelated to indexing on series/frames, not to indexes themselves
PeriodPeriod data type
on Jan 30, 2017
added this to the won't fix milestone on Jan 30, 2017
jorisvandenbossche

jorisvandenbossche commented on Jan 30, 2017

@jorisvandenbossche
Member

It is get_value that raises the error, not set_value.

I agree that you don't need to use this method in the current example, but still, it is a public, documented method that in this case totally fails to do what is documented it should. According to the docstring it takes row and column labels, which fails:

In [155]: bins.get_value(bins.index[0], bins.columns[0])
...
KeyError: Period('2016-10-17', 'D')

Shouldn't we just fix this? Or update the documentation to discourage its usage? (or both)

devanl

devanl commented on Jan 30, 2017

@devanl
Author

Probably not the place but, could you please explain a better way to populate each of he columns based on conditional analysis of external time stamped data being counted for each of the periods in the PeriodIndex?

jorisvandenbossche

jorisvandenbossche commented on Jan 30, 2017

@jorisvandenbossche
Member

@devanl you can better ask on StackOverflow (and be sure to give there a reproducible example with a clear expected result, as this is currently not fully clear to me)

jreback

jreback commented on Jan 30, 2017

@jreback
Contributor

These are effectively internal method and should actually be deprecated. I thought we did this quite a while back.

modified the milestone: won't fix on Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesPeriodPeriod data typeUsage Question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jreback@jorisvandenbossche@TomAugspurger@devanl

        Issue actions

          DataFrame with PeriodIndex causes KeyError on get_value · Issue #15268 · pandas-dev/pandas