Skip to content

timezone aware columns #9242

@quicknir

Description

@quicknir

At the moment, I don't see any way to have performant, timezone aware columns. This seems rather surprising, as generally when you have a column with timestamps, they will all likely be of the same timezone. Right now, as far as I can see you have two options:

  1. Columns that are basically just numpy datetime64[ns] types. These seem to be timezone unaware.If you make such a column timezone aware (by e.g., dataframe.time_column.dt.tz_localize('UTC')) it becomes a column of dtype object.
  2. A DatetimeIndex, which keeps track of timezone information at seemingly the column level (which is laudable). However, this only seems to really work used as an index. If I assign it to a column, it again gets converted to a dtype object, and things get slow.

Am I missing something? Is there a really good reason why a DatetimeIndex can't just be used, as is, in a column, without the dtype=object conversion?

Activity

jorisvandenbossche

jorisvandenbossche commented on Jan 13, 2015

@jorisvandenbossche
Member

Your analysis is generally correct. The problem is, as you pointed out, that numpy does not have support for time zones, and the data in columns are stored as numpy arrays.

The DatetimeIndex provides some work-arounds to handle time zones at the level of the full index, and these workaround are not (yet) available for columns ('blocks'). But, as far as I understand, it would be possible to do something similar for the DatetimeBlock, but @jreback can shed more light on this.

I think it is a rather big enhancement, but if your are interested in working on this, certainly welcome! Or, for improving timezone support in numpy itself, they are certainly also looking for help.

quicknir

quicknir commented on Jan 13, 2015

@quicknir
Author

Thank your @jorisvandenbossche, very helpful. Handling it at the DatetimeBlock level seems a bit messier, e.g. different columns could have different timezone or precision information. I was thinking of a solution more like Categorical. As far as I can see Categorical seems to have all the right basic infrastructure in place to duplicate to create a DatetimeColumn, but naturally this is a very superficial viewpoint.

Sure, I am interested in at least looking at what would be required. The timezone support on the numpy side I actually view as adequate, in the sense that I think the datetime64[ns] is perfectly adequate as a low-level data type, and has the enormous advantage of being just exactly a 64 bit integer and nothing else. This, I think, should be handled on the pandas side.

jreback

jreback commented on Jan 13, 2015

@jreback
Contributor

dupe of #8260

@quicknir you are welcome to give this a go, its actually not that tricky, just inherit from DatetimeBlock. And the NonConsolidatingBlock mixin (so these blocks are not combined with one another). This is by far the cleanest soln.

Sure, I am interested in at least looking at what would be required. The timezone support on the numpy side I actually view as adequate, in the sense that I think the datetime64[ns] is perfectly adequate as a low-level data type, and has the enormous advantage of being just exactly a 64 bit integer and nothing else. This, I think, should be handled on the pandas side.

support on numpy is non-existant, though there are some proposals. just look thru the pandas codebase and you will appreciate the enormity of what @wesm did with timezones.

added
InternalsRelated to non-user accessible pandas implementation
TimezonesTimezone data dtype
on Jan 13, 2015
added this to the No action milestone on Feb 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Duplicate ReportDuplicate issue or pull requestInternalsRelated to non-user accessible pandas implementationTimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jreback@jorisvandenbossche@quicknir

        Issue actions

          timezone aware columns · Issue #9242 · pandas-dev/pandas