Skip to content

DISC: add accessor attributes to Index for consistency with Series #17134

@jbrockmendel

Description

@jbrockmendel
Member

Broken off of #17117 for discussion, xref #8162, #17061, recent mailing list thread.

With a datetime-like column we can access year, hour, ... with self.dt.foo. With a DatetimeIndex (PeriodIndex, ...) we access these attributes directly self.foo without the .dt. I'd like to add a .dt property to the appropriate Index subclasses so that these attributes can be accessed symmetrically. i.e. instead of:

if isinstance(obj, pd.Index):
    year = obj.year
elif isinstance(obj, pd.Series):
    year = obj.dt.year

we can just use year = obj.dt.year regardless.

The implementation is three lines in core.indexes.datetimelike.DatetimeIndexOpsMixin`:

    @property
    def dt(self):
        return self

Thoughts?

Activity

TomAugspurger

TomAugspurger commented on Aug 1, 2017

@TomAugspurger
Contributor

I'd support this. It makes writing generic downstream code easier.

changed the title [-]DISC: Make Index Behave More Like Series[/-] [+]DISC: add accessor attributes to Index for consistency with Series[/+] on Aug 1, 2017
jorisvandenbossche

jorisvandenbossche commented on Aug 1, 2017

@jorisvandenbossche
Member

I would also support the idea, although I wouldn't implement it like that, because this would give you not the selection of options on tab completion (and also gives the possibility to misuse the accessor and use a wrong method eg df.index.dt.mean())

jbrockmendel

jbrockmendel commented on Aug 1, 2017

@jbrockmendel
MemberAuthor

@jorisvandenbossche That's a good point. Shouldn't be too difficult to do implement the "right" way. It'll take some small edits to core.indexes.accessors.maybe_to_datetimelike. Implementing that will be easier if/when #17042 gets merged.

jreback

jreback commented on Aug 8, 2017

@jreback
Contributor

I think this is ok. This makes Index/Series more consistent which is a good. thing.

jorisvandenbossche

jorisvandenbossche commented on Aug 9, 2017

@jorisvandenbossche
Member

One thing to discuss is what it should return.

In the PR you just opened, you say:

it will return an object that is effectively index.to_series().dt

But it could also return the same as the plain index. method (which is what you originally proposed: a index.dt which would effectively return self) ? Both options are not the same:

In [72]: idx = pd.date_range("2017-01-01", periods=3)

In [74]: idx.day
Out[74]: Int64Index([1, 2, 3], dtype='int64')

In [75]: idx.to_series().dt.day
Out[75]: 
2017-01-01    1
2017-01-02    2
2017-01-03    3
Freq: D, dtype: int64

IMO it should return [74], not [75].
Although, this may defeat a bit your goal of not having to care about if something is an index or a Series? (as .dt will work on both, but still return something different)
But for that (having an index and column that really behave the same), it is maybe more worthwhile to work on #8162 instead of adding dt to the Index ?

jreback

jreback commented on Aug 9, 2017

@jreback
Contributor

this should for sure be [74]. In fact the impl is pretty trivial (and indicated at the top of the PR).

added this to the Next Major Release milestone on Aug 9, 2017
jorisvandenbossche

jorisvandenbossche commented on Aug 9, 2017

@jorisvandenbossche
Member

and indicated at the top of the PR).

In one of the previous ones, but not in the latest: #17204 (I have to admit I am a bit lost in all the different PRs with related content)

jbrockmendel

jbrockmendel commented on Aug 9, 2017

@jbrockmendel
MemberAuthor

(I have to admit I am a bit lost in all the different PRs with related content)

Sorry for the inundation. I'll be slowing down shortly.

@jreback & @jorisvandenbossche
The implementation in #17204 behaves like [75]. The implementation with a property at the top of the PR behaves like [74]. I'm largely indifferent between the two, implemented the [75]-like version to address @jorisvandenbossche's comment about dir(index.dt).

The main advantage of the [75] version in #17204 is that it make index.dt actually go through the same code paths for Index and Series, so the similar behavior is not just cosmetic. I view that as a step towards e.g. #8162.

jorisvandenbossche

jorisvandenbossche commented on Aug 10, 2017

@jorisvandenbossche
Member

I'll be slowing down shortly.

No need to slow down! It is just that it can be a confusing when multiple PRs try to solve related / the same problems in different ways, and then you need to carefully explain there what the PR does, what the difference / relationship is with the other open PRs.

The main advantage of the [75] version in #17204 is that it make index.dt actually go through the same code paths for Index and Series, so the similar behavior is not just cosmetic. I view that as a step towards e.g. #8162.

It makes the output for Series and Index indeed more similar, but, it makes the output for Index completely inconsistent with other index attributes. So I still think it should be like [74].
But as I said above, if you really want to be able to handle your index as it is a column/Series, I think #8162 is a good issue to work on.

jreback

jreback commented on Aug 10, 2017

@jreback
Contributor

This needs to just use the simpler implementation (e.g. Index.dt -> self. anything else is inconsistent. This should be very straightforward to do.

jbrockmendel

jbrockmendel commented on Aug 10, 2017

@jbrockmendel
MemberAuthor

I'm on board with @jreback's suggestion. @jorisvandenbossche is your original concern about the index.dt namespace a sticking point?

added and removed
Compatpandas objects compatability with Numpy or Python functions
on Apr 10, 2020
removed this from the Contributions Welcome milestone on Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @jreback@jorisvandenbossche@TomAugspurger@jbrockmendel@mroeschke

      Issue actions

        DISC: add accessor attributes to Index for consistency with Series · Issue #17134 · pandas-dev/pandas