Open
Description
The original deprecation happened in #38544
This comment is from #22384 (comment), moving it here to a separate issue.
But, on the specific datetime -> integer deprecation:
- I find it a bit strange to deprecate/disallow it for
astype
, but then point people to theview
instead. There are usecases where you need the integers (eg if you want to do some custom rounding, or need to feed it to a system that requires unix time as integers, ...), and personally I would rather have users go toastype
thanview
(becauseastype
is the more standard method for this, + if we would go with copy-on-write, this gets a bit a strange method ...)
In addition, usingview
will actually error for non-equal size bitwidth (astype
actually as well, but that's something we can change, while forview
that is inherent to the method). Andview
can also silently overflow if converting to uint64, while forastype
we could check for that. In general, I seeview
as an advanced method you should only use if you really know what you are doing (and in general you don't really need in pandas, I think) - There is no ambiguity around what the expected result would be IMO (for naive datetimes / timedelta)
- The other way around (integer -> datetime / timedelta) is not deprecated
There is then some follow-up discussion in the issue below #22384 (comment)
I would personally propose to keep allowing astype()
for datetime64 -> int64, and not steer users to view()
for this.
Activity
jbrockmendel commentedon Dec 23, 2021
@jorisvandenbossche can you add to the OP the 2-3 relevant responses from that thread.
jreback commentedon Dec 24, 2021
ok I reread the discussion a bit.
.view
(though common in numpy) is not common in pandas and we should undeprecate here and allow this type of casting (note that we did this in 1.3 so its a change again)jbrockmendel commentedon Dec 24, 2021
This point I find compelling. IIRC deprecating in that direction was too invasive to be feasible.
A thought that didn't come up on the old thread: what happens if/when we have non-nano? Does dt64second.astype(int64) also do a .view(int64), or does it do some division?
Implementation questions, some from the old thread:
jorisvandenbossche commentedon Dec 25, 2021
I would think that it will return the underlying integers (no calculation), so the exact integers you get is dependent on the resolution you have. That's also one of the reasons we can't simply change the default resolution. But that's an issue anyhow, regardless of people using astype vs view for this conversion.
Long term I would say yes, but that's something we will also need to deprecate first, as currently it returns the integer representation of NaT.
Casting to int32 already raises an error. Personally, I would allow this in the future (if we error on overflow), but since it raises already there is no urgency on this aspect.
That doesn't work currently, so I think we can defer that question to a later discussion on the more general casting rules (which is now partly happening in #22384, but I need to open dedicated issue for aspects of that discussion (such as also the idea of safer casting by default detecting overflow etc).
For Categorical, you have the
.codes
to access the underlying integers using public API, so I don't think it's necessarily needed to support this through casting as well (for categorical, the casting generally happens at the level of the categories, not codes).For Period it's a bit less clear: the scalar as
.ordinal
, and the array and index have.asi8
, but that's not accessible from Series. But so I see now that the Period -> int64 casting is deprecated similarly as datetime64 (this issue). So if we undeprecate datetime64->int64 casting, we can for now do the same for Period?jorisvandenbossche commentedon Dec 25, 2021
Actually, it's a bit more complicated than that. We did that in the last release (1.3), but did raise an error already before that. But only for tz-naive, and not for tz-aware ..
Also casting to uint64 already raised an error before 1.3, while this works in 1.3 / master (and it actually also doesn't trigger a deprecation warning in 1.3). But again only for tz-naive, while for tz-aware it always worked.
An overview of the behaviours:
pandas 1.0 - 1.2
pandas 1.3
Code to generate the table
jbrockmendel commentedon Dec 25, 2021
It gets more complicated yet: Series.astype dissallows casting to int32, but DatetimeIndex and DatetimeArray treat any numpy integer dtype as i8.
28 remaining items