Open
Description
Separate scalar missing values Timedelta, Timestamp and Period scalars would go a long ways towards achieving predictable types with pandas. As noted in #19124, it is impossible to make some operations consistent with the current state of affairs. Most recently this came up in #24957.
This is listed in the pandas2 tracker (wesm/pandas2#74), but I think it might even be achievable for pandas 1.x? There would only be backwards compatibility issues if people are explicitly checking object identity against the pd.NaT
scalar, which is a bit of an anti-pattern.
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
jbrockmendel commentedon Jan 28, 2019
xref #24645
burnpanck commentedon Aug 16, 2022
We did run into this in production code of ours, where we do some timedelta gymnastics: An innocent looking part of code iterates over the rows of a pandas dataframe, and among other things applies a
np.maximum
to a timedelta within that row, and a constant lower bound. This fails with anUFuncTypeError
only in the case where we happen to have a NaT in that row, even if that piece of code would otherwise work fine with NaTs. Thus, in our view, we'd consider this an Issue rather than an Enhancement, as it breaks invariants that one would expect from the type system.While searching for this issue, I also came across #46171. If I understand correctly, in that PR type annotations have been made less accurate for the convenience of the users. I believe that this inconvenience was actually an alarm signal from the type checkers pointing to the underlying issue. The PR simply swept that alarm signal under the rug. Without that PR, the type-checker might have prevented us from running into this in production.
Timestamp
andTimedelta
. #58475