Open
Description
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
Question about pandas
Pandas version 2.2 raises a warning when using this code:
import pandas as pd
df = pd.DataFrame.from_dict({"something": {pd.Period("2022", "Y-DEC"): 2.5}})
# FutureWarning: Resampling with a PeriodIndex is deprecated.
# Cast index to DatetimeIndex before resampling instead.
print(df.resample("M").ffill())
# something
# 2022-01 2.5
# 2022-02 2.5
# 2022-03 2.5
# 2022-04 2.5
# 2022-05 2.5
# 2022-06 2.5
# 2022-07 2.5
# 2022-08 2.5
# 2022-09 2.5
# 2022-10 2.5
# 2022-11 2.5
# 2022-12 2.5
This does not work:
df.index = df.index.to_timestamp()
print(df.resample("M").ffill())
# something
# 2022-01-31 2.5
I have PeriodIndex all over the place and I need to resample them a lot, filling gaps with ffill.
How to do this with Pandas 2.2?
Activity
MarcoGorelli commentedon Jan 23, 2024
This comes from #55968 , and here's the relevant issue #53481
I'd suggest to create a datetimeindex and ffill
matteo-zanoni commentedon Feb 26, 2024
I just hit this too.
For upsampling the
PeriodIndex
had a different result than theDatetimeIndex
, in particularPeriodIndex
would create a row for each new period in the old period:This would create a series including ALL hours of 2024-01-02.
If, instead, we first convert to
DatetimeIndex
:The output will only contain 1 hour of 2024-01-02.
Even if the series is constructed directly with a
DatetimeIndex
(even one containing the frequency information) the result is the same:Will there be no way to obtain the old behaviour of
PeriodIndex
in future versions?IMO upsampling is quite common and the way
PeriodIndex
implemented it is more usefull. It would be a shame to loose it.andreas-wolf commentedon Feb 27, 2024
I agree. Now you'll need to do reindexing manually, while with periodIndex this was a one-liner.
Furthermore resampling with a datetime index seems to change the data type (a bug?). Here some sample code:
I also opt for keeping the periodIndex resampling.
ChadFulton commentedon Feb 27, 2024
Related to my comments in #56588, I think that this is another example where Period is being deprecated too fast without a clear replacement in mind.
jbrockmendel commentedon Mar 28, 2024
Are all the relevant cases about upsampling and never downsampling? A big part of the motivation for deprecating was that PeriodIndexResampler._downsample is deeply broken in a way that didn't seem worth fixing. Potentially we could just deprecate downsampling and not upsampling?
andreas-wolf commentedon Mar 28, 2024
The upsampling example is just the one where it's very obvious what will be missing when periodindex resampling won't work any more.
When downsampling would not work anymore I would have to convert the index, downsample and convert the index back again. Does not sound very compelling.
The period index resampling (up and down) is very convenient when one has to combine different data sources in days, months, quarters and years. I can't remember a project where I did not use period resampling. The convenience was always an argument to use pandas instead of other libraries like polars where one has to handle all the conversions yourself.
From my point of view the PeriodIndex was always one of the great things about Pandas.
I have very limited experience with Pandas internals, so I don't understand how downsampling can be deeply broken so that it's not worth fixing when "just" converting to a datetime index would fix it? Can't the datetime indexing be used internally to fix it?
MarcosYanase commentedon Apr 3, 2024
I agree with @andreas-wolf.
My projects have a lot of dataframes using PeriodIndex, with differents frequencies and resampling is very very useful tool for calculation.
Keeping it at least for upsampling would be excellent, but for downsampling a workaround (like Andreas posted) is not trivial.
What are the bugs related to PeriodIndexResampler._downsample?
MarcosYanase commentedon Aug 27, 2024
Any news about this issue?
The deprecation was due this #53481, correct @jbrockmendel ?
https://github.com/pandas-dev/pandas/blob/c375533d670a7114c36ebb114c01ec7d57b92753/pandas/core/resample.py#L1800C1-L1813C33
I'm naively changing the lines "return self.asfreq()" to "return super()._downsample(how, **kwargs)", delivering the responsability to DatetimeIndexResampler._downsample.
It works for the example in #53481 and other tests that I'm doing (using other downsampling methods), giving similar outputs to DatetimeIndexResampler._downsample. But I'm sure it's not so simple.
Could you please give more examples where PeriodIndexResampler._downsample continues to be broken?
jbrockmendel commentedon Aug 27, 2024
I don't have the bandwidth to give you a thorough answer. What I can tell you is that there are no plans to enforce this deprecation in 3.0.
MarcosYanase commentedon Aug 30, 2024
Is there an option to not deprecate resample with PeriodIndex?
If yes, how is the process? We fix the issues with it first and delete the FutureWarning, or do both simultaneously?
I think it's possible to fix the example in #53481 doing what I wrote above:
About #58021 (comment):
pandas/pandas/tests/resample/test_base.py
Line 229 in e4956ab
-- First error: ValueError("for Period, please use 'M' instead of 'ME'")
-- Doing it, it will fail but we could fix it modifying
pandas/pandas/core/resample.py
Line 1643 in e4956ab
-- I think it (https://github.com/pandas-dev/pandas/blob/e4956ab403846387a435cd7b3a8f36828c23c0c7/pandas/tests/resample/test_period_index.py#L684C1-L696C49) should not work. Even correcting to something like:
we have
which I think is expected (after reindex we have many NaN, changing the type to float64 and this continues after ffill)
And, using the fix that I'm suggesting, we will have:
because https://github.com/pandas-dev/pandas/blob/e4956ab403846387a435cd7b3a8f36828c23c0c7/pandas/tests/resample/test_period_index.py#L287C1-L294C1:
is expecting the wrong behavior pointed out by #53481.
I think it's ok to expect the same values using sum, mean, etc... but how could we expected the same values/attributes using methods like std, var, ohlc, nunique, size, count, etc? I didn't find similar test for datetime_index.