Skip to content

Separate NaT values for Timedelta ("NaTD") and Period? #24983

Open
@shoyer

Description

@shoyer
Member

Separate scalar missing values Timedelta, Timestamp and Period scalars would go a long ways towards achieving predictable types with pandas. As noted in #19124, it is impossible to make some operations consistent with the current state of affairs. Most recently this came up in #24957.

This is listed in the pandas2 tracker (wesm/pandas2#74), but I think it might even be achievable for pandas 1.x? There would only be backwards compatibility issues if people are explicitly checking object identity against the pd.NaT scalar, which is a bit of an anti-pattern.

Activity

jbrockmendel

jbrockmendel commented on Jan 28, 2019

@jbrockmendel
Member

xref #24645

added
Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
TimedeltaTimedelta data type
on Feb 7, 2019
burnpanck

burnpanck commented on Aug 16, 2022

@burnpanck
Contributor

We did run into this in production code of ours, where we do some timedelta gymnastics: An innocent looking part of code iterates over the rows of a pandas dataframe, and among other things applies a np.maximum to a timedelta within that row, and a constant lower bound. This fails with an UFuncTypeError only in the case where we happen to have a NaT in that row, even if that piece of code would otherwise work fine with NaTs. Thus, in our view, we'd consider this an Issue rather than an Enhancement, as it breaks invariants that one would expect from the type system.

While searching for this issue, I also came across #46171. If I understand correctly, in that PR type annotations have been made less accurate for the convenience of the users. I believe that this inconvenience was actually an alarm signal from the type checkers pointing to the underlying issue. The PR simply swept that alarm signal under the rug. Without that PR, the type-checker might have prevented us from running into this in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds DiscussionRequires discussion from core team before further actionPeriodPeriod data typeTimedeltaTimedelta data type

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @shoyer@burnpanck@jbrockmendel@gfyoung@mroeschke

        Issue actions

          Separate NaT values for Timedelta ("NaTD") and Period? · Issue #24983 · pandas-dev/pandas