Skip to content

Timezones silently dropped in parsing #18702

Closed
@jbrockmendel

Description

@jbrockmendel

TLDR: pandas should pass a tzinfos kwarg to the dateutil parser using sensible defaults.

dateutil has a bug that silently drops most timezones. That bug is inherited by pandas. The following is run on a machine located in US/Pacific:

>>> pd.Timestamp('2017-12-08 08:20 PM PST')     # <-- only parsed correctly because of locale
Timestamp('2017-12-08 20:20:00-0800', tz='tzlocal()')
>>> pd.Timestamp('2017-12-08 08:20 PM EST')     # <-- timezone silently dropped
Timestamp('2017-12-08 20:20:00')

There is a partial fix in progress over at dateutil, the most likely outcome of which is that these cases will raise in the future unless a tzinfos kwarg is explicitly passed to dateutil.parser.parse. The issue for pandas is then to decide on what tzinfos to pass (a suggestion to handle the most common use cases by default within dateutil went nowhere).

The tzinfos kwarg is a dictionary taking a string and returning a tzinfo object, e.g.

unambiguous_tzinfos = {
    'PDT': dateutil.tz.gettz('US/Pacific'),
    'PT': dateutil.tz.gettz('US/Pacific'),
    'MDT': dateutil.tz.gettz('US/Mountain'),
    'MT': dateutil.tz.gettz('US/Mountain'),
    'ET': dateutil.tz.gettz('US/Eastern'),
    'CET': dateutil.tz.gettz('Europe/Amsterdam),
    'NZDT': dateutil.tz.gettz('Pacific/Auckland')}

This example includes only abbreviations for which there are no other alternatives listed here. So e.g. "CST" is excluded since it could also be "China Standard Time", "EST" is excluded since it could refer to "Australian Eastern Standard Time". Note this is only a subset of the unambiguous abbreviations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions