Skip to content

API: allow casting datetime64 to int64? #45034

Open
@jorisvandenbossche

Description

@jorisvandenbossche
Member

The original deprecation happened in #38544

This comment is from #22384 (comment), moving it here to a separate issue.

But, on the specific datetime -> integer deprecation:

  • I find it a bit strange to deprecate/disallow it for astype, but then point people to the view instead. There are usecases where you need the integers (eg if you want to do some custom rounding, or need to feed it to a system that requires unix time as integers, ...), and personally I would rather have users go to astype than view (because astype is the more standard method for this, + if we would go with copy-on-write, this gets a bit a strange method ...)
    In addition, using view will actually error for non-equal size bitwidth (astype actually as well, but that's something we can change, while for view that is inherent to the method). And view can also silently overflow if converting to uint64, while for astype we could check for that. In general, I see view as an advanced method you should only use if you really know what you are doing (and in general you don't really need in pandas, I think)
  • There is no ambiguity around what the expected result would be IMO (for naive datetimes / timedelta)
  • The other way around (integer -> datetime / timedelta) is not deprecated

There is then some follow-up discussion in the issue below #22384 (comment)

I would personally propose to keep allowing astype() for datetime64 -> int64, and not steer users to view() for this.

cc @jbrockmendel

Activity

jbrockmendel

jbrockmendel commented on Dec 23, 2021

@jbrockmendel
Member

@jorisvandenbossche can you add to the OP the 2-3 relevant responses from that thread.

jreback

jreback commented on Dec 24, 2021

@jreback
Contributor

ok I reread the discussion a bit.

  • dt -> int casting is deprecated but i agree that .view (though common in numpy) is not common in pandas and we should undeprecate here and allow this type of casting (note that we did this in 1.3 so its a change again)
  • we actually need to finalize the casting rules before we start deprecating things.
jbrockmendel

jbrockmendel commented on Dec 24, 2021

@jbrockmendel
Member

The other way around (integer -> datetime / timedelta) is not deprecated

This point I find compelling. IIRC deprecating in that direction was too invasive to be feasible.

A thought that didn't come up on the old thread: what happens if/when we have non-nano? Does dt64second.astype(int64) also do a .view(int64), or does it do some division?

Implementation questions, some from the old thread:

  1. Do we raise on dt64.astype(int64) when NaTs are present? (analogous to what we do for float->int with nans)
  2. Can we at least only allow dt64.astype(int64), i.e. not allow dt64.astype(int32) or dt64.astype(uint64) (which ATM we ignore and just cast to int64)
  3. Do we allow dt64.astype(float)?
  4. If we're pretending that dt64.astype(int64) is semantically meaningful, do we do the same for dt64tz or Period? Heck even Categorical?
jorisvandenbossche

jorisvandenbossche commented on Dec 25, 2021

@jorisvandenbossche
MemberAuthor

A thought that didn't come up on the old thread: what happens if/when we have non-nano? Does dt64second.astype(int64) also do a .view(int64), or does it do some division?

I would think that it will return the underlying integers (no calculation), so the exact integers you get is dependent on the resolution you have. That's also one of the reasons we can't simply change the default resolution. But that's an issue anyhow, regardless of people using astype vs view for this conversion.

1. Do we raise on dt64.astype(int64) when NaTs are present? (analogous to what we do for float->int with nans)

Long term I would say yes, but that's something we will also need to deprecate first, as currently it returns the integer representation of NaT.

2. Can we at least only allow dt64.astype(int64), i.e. not allow dt64.astype(int32) or dt64.astype(uint64) (which ATM we ignore and just cast to int64)

Casting to int32 already raises an error. Personally, I would allow this in the future (if we error on overflow), but since it raises already there is no urgency on this aspect.

3. Do we allow dt64.astype(float)?

That doesn't work currently, so I think we can defer that question to a later discussion on the more general casting rules (which is now partly happening in #22384, but I need to open dedicated issue for aspects of that discussion (such as also the idea of safer casting by default detecting overflow etc).

  1. If we're pretending that dt64.astype(int64) is semantically meaningful, do we do the same for dt64tz or Period? Heck even Categorical?

For Categorical, you have the .codes to access the underlying integers using public API, so I don't think it's necessarily needed to support this through casting as well (for categorical, the casting generally happens at the level of the categories, not codes).

For Period it's a bit less clear: the scalar as .ordinal, and the array and index have .asi8, but that's not accessible from Series. But so I see now that the Period -> int64 casting is deprecated similarly as datetime64 (this issue). So if we undeprecate datetime64->int64 casting, we can for now do the same for Period?

jorisvandenbossche

jorisvandenbossche commented on Dec 25, 2021

@jorisvandenbossche
MemberAuthor
  1. Do we raise on dt64.astype(int64) when NaTs are present? (analogous to what we do for float->int with nans)

Long term I would say yes, but that's something we will also need to deprecate first, as currently it returns the integer representation of NaT.

Actually, it's a bit more complicated than that. We did that in the last release (1.3), but did raise an error already before that. But only for tz-naive, and not for tz-aware ..
Also casting to uint64 already raised an error before 1.3, while this works in 1.3 / master (and it actually also doesn't trigger a deprecation warning in 1.3). But again only for tz-naive, while for tz-aware it always worked.

An overview of the behaviours:

pandas 1.0 - 1.2

has_nat no NaT with NaT
dtype target_dtype
datetime64[ns, Europe/Brussels] int32 works works
int64 works works
uint64 works works
datetime64[ns] int32 cannot astype a datetimelike from [datetime64[ns]] to [int32] cannot astype a datetimelike from [datetime64[ns]] to [int32]
int64 works Cannot convert NaT values to integer
uint64 cannot astype a datetimelike from [datetime64[ns]] to [uint64] cannot astype a datetimelike from [datetime64[ns]] to [uint64]

pandas 1.3

has_nat no NaT with NaT
dtype target_dtype
datetime64[ns, Europe/Brussels] int32 cannot astype a datetimelike from [datetime64[ns, Europe/Brussels]] to [int32] cannot astype a datetimelike from [datetime64[ns, Europe/Brussels]] to [int32]
int64 works (+warning) works (+warning)
uint64 works (+warning) works (+warning)
datetime64[ns] int32 cannot astype a datetimelike from [datetime64[ns]] to [int32] cannot astype a datetimelike from [datetime64[ns]] to [int32]
int64 works (+warning) works (+warning)
uint64 works (+warning) works (+warning)
Code to generate the table
import pandas as pd
import warnings
warnings.simplefilter("always")

print(pd.__version__)

results = []

for dt_dtype in ["datetime64[ns]", "datetime64[ns, Europe/Brussels]"]:
    s = pd.Series(["2012-01-01", "2012-01-02", "NaT"], dtype=dt_dtype)

    for end in [2, 3]:
        has_nat = "no NaT"
        if end == 3:
            has_nat = "with NaT"

        for dtype in ["int64", "int32", "uint64"]:
            try:
                with warnings.catch_warnings(record=True) as record:
                    s.iloc[:end].astype(dtype)
            except Exception as err:
                res = str(err)
            else:
                if record:
                    res = "works (+warning)"
                else:
                    res = "works"
        
            results.append((dt_dtype, dtype, has_nat, res))

df = pd.DataFrame(results, columns=["dtype", "target_dtype", "has_nat", "result"])

print(df.pivot(index=["dtype", "target_dtype"], columns=["has_nat"], values="result").to_html())
jbrockmendel

jbrockmendel commented on Dec 25, 2021

@jbrockmendel
Member

It gets more complicated yet: Series.astype dissallows casting to int32, but DatetimeIndex and DatetimeArray treat any numpy integer dtype as i8.

added this to the 1.5 milestone on Dec 31, 2021
added
API - ConsistencyInternal Consistency of API/Behavior
and removed
Blocker for rcBlocking issue or pull request for release candidate
on Dec 31, 2021
modified the milestones: 1.5, 1.4 on Jan 12, 2022

28 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @jreback@jorisvandenbossche@jbrockmendel@simonjayhawkins

      Issue actions

        API: allow casting datetime64 to int64? · Issue #45034 · pandas-dev/pandas