Open
Description
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/70436514/timestamp-binning-mechanics-when-resampling
Question about pandas
>>> df = pd.DataFrame(index=pd.date_range(start='2021-04-21 01:00:00', end='2021-04-28 01:00', freq='1d'), data=[1]*8)
>>> df.resample(rule='7d', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
0
2021-04-22 2
2021-04-29 6
>>> df.resample(rule='168h', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
0
2021-04-22 1
2021-04-29 7
- Why does this happen?
- Should this happen?
Using pandas 1.3.5
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
ms7463 commentedon Dec 23, 2021
Looks like the issue happens here:
pandas/pandas/core/resample.py
Lines 1656 to 1668 in 5a22750
The 7d closed=right resample hits this condition, and this code branch readjusts the bin edges (starting at line 1666). If you run in debug mode and skip over this adjustment, you get the same results in your example.
Looks like this logic or similar has been there for a long time (but potentially meant to deal with Monthly/Weekly frequencies, rather than
N*Daily
frequencies?), maybe @mroeschke or @jorisvandenbossche can comment on it, since it seems they touched this section of the code most recently.mroeschke commentedon Dec 23, 2021
This looks like a bug because
resample
has special logic for redefining'D'
in the presence of a DST transition #41943 which seems to negatively impact when there is no timezone defined in the example above.This will hopefully be fixed in 2.0 where we want to remove this special casing: #44823
mkp-gebensleben commentedon Jan 3, 2022
@mroeschke
I'm getting the same results when using a timezone.
[-]QST: Why is resampling with rule='7d' different than resampling with rule='168h'?[/-][+]BUG: Why is resampling with rule='7d' different than resampling with rule='168h'?[/+]