Skip to content

BUG: Why is resampling with rule='7d' different than resampling with rule='168h'? #44996

Open
@mkp-gebensleben

Description

@mkp-gebensleben

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

    I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/70436514/timestamp-binning-mechanics-when-resampling

Question about pandas

>>> df = pd.DataFrame(index=pd.date_range(start='2021-04-21 01:00:00', end='2021-04-28 01:00', freq='1d'), data=[1]*8)
>>> df.resample(rule='7d', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
            0
2021-04-22  2
2021-04-29  6
>>> df.resample(rule='168h', origin='2021-04-29 00:00:00', closed='right', label='right').sum()
            0
2021-04-22  1
2021-04-29  7
  1. Why does this happen?
  2. Should this happen?

Using pandas 1.3.5

Activity

ms7463

ms7463 commented on Dec 23, 2021

@ms7463
Contributor

Looks like the issue happens here:

pandas/pandas/core/resample.py

Lines 1656 to 1668 in 5a22750

if self.freq != "D" and is_superperiod(self.freq, "D"):
if self.closed == "right":
# GH 21459, GH 9119: Adjust the bins relative to the wall time
bin_edges = binner.tz_localize(None)
bin_edges = bin_edges + timedelta(1) - Nano(1)
bin_edges = bin_edges.tz_localize(binner.tz).asi8
else:
bin_edges = binner.asi8
# intraday values on last day
if bin_edges[-2] > ax_values.max():
bin_edges = bin_edges[:-1]
binner = binner[:-1]

The 7d closed=right resample hits this condition, and this code branch readjusts the bin edges (starting at line 1666). If you run in debug mode and skip over this adjustment, you get the same results in your example.

Looks like this logic or similar has been there for a long time (but potentially meant to deal with Monthly/Weekly frequencies, rather than N*Daily frequencies?), maybe @mroeschke or @jorisvandenbossche can comment on it, since it seems they touched this section of the code most recently.

mroeschke

mroeschke commented on Dec 23, 2021

@mroeschke
Member

This looks like a bug because resample has special logic for redefining 'D' in the presence of a DST transition #41943 which seems to negatively impact when there is no timezone defined in the example above.

This will hopefully be fixed in 2.0 where we want to remove this special casing: #44823

added and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Dec 23, 2021
mkp-gebensleben

mkp-gebensleben commented on Jan 3, 2022

@mkp-gebensleben
Author

@mroeschke

which seems to negatively impact when there is no timezone defined in the example above

I'm getting the same results when using a timezone.

>>> df = pd.DataFrame(index=pd.date_range(start='2021-04-21 01:00:00', end='2021-04-28 01:00', freq='1d', tz=0), data=[1]*8)
>>> df.resample(rule='7d', origin='2021-04-29 00:00:00+00:00', closed='right', label='right').sum()
                           0
2021-04-22 00:00:00+00:00  2
2021-04-29 00:00:00+00:00  6
>>> df.resample(rule='168h', origin='2021-04-29 00:00:00+00:00', closed='right', label='right').sum()
                           0
2021-04-22 00:00:00+00:00  1
2021-04-29 00:00:00+00:00  7
changed the title [-]QST: Why is resampling with rule='7d' different than resampling with rule='168h'?[/-] [+]BUG: Why is resampling with rule='7d' different than resampling with rule='168h'?[/+] on Jan 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @mroeschke@ms7463@mkp-gebensleben

        Issue actions

          BUG: Why is resampling with rule='7d' different than resampling with rule='168h'? · Issue #44996 · pandas-dev/pandas