Skip to content

BUG: Inconsistent datetime comparison with Tz  #12601

Closed
@sinhrks

Description

@sinhrks
Member

Related to #8306. On current master, Timestamp comparison results in TypeError if its timezones are different. However, Index and Series implicitly converts tz to GMT

pd.Timestamp('2016-01-01 12:00', tz='US/Eastern') > pd.Timestamp('2016-01-01 08:00')
# TypeError: Cannot compare tz-naive and tz-aware timestamps

# same result as idx.tz_convert(None) > pd.Timestamp('2016-01-01 08:00')
idx = pd.date_range('2016-01-01 12:00', periods=10, freq='H', tz='Asia/Tokyo')
idx > pd.Timestamp('2016-01-01 08:00')
# array([False, False, False, False, False, False,  True,  True,  True,  True], dtype=bool)

Numeric ops raises TypeError as expected.

idx - pd.Timestamp('2016-01-01 08:00')
# TypeError: Timestamp subtraction must have the same timezones or no timezones

Activity

added this to the 0.18.1 milestone on Mar 12, 2016
gliptak

gliptak commented on Mar 13, 2016

@gliptak
Contributor

I opened numpy/numpy#7390 Does this belong to pandas instead? Thanks

jreback

jreback commented on Mar 13, 2016

@jreback
Contributor

this is an invalid dtype for numpy and not defined there
further what you are doing doesn't make any sense

gliptak

gliptak commented on Mar 13, 2016

@gliptak
Contributor

@jreback I'm validating that the ts column has datetime64 with timezone (just comparing it to datetime64 fails ...). How would this need to be coded?

jreback

jreback commented on Mar 13, 2016

@jreback
Contributor

use .select_dtypes or an com.is_datetimelike or com.is_datetime64tz_dtype

numpy doesn't know about/respect this (its really a bug in the dtype definition and i don't know when/if ever will be fixed/allowed).

jreback

jreback commented on Mar 13, 2016

@jreback
Contributor

Here is also the method to coerce. EDT is not a timezone, and what dateutil is doing is wrong and doesn't give you anything useful.

In [34]: df = pd.DataFrame(["Mar 10, 2016 11:20 PM EDT"], columns=['ts'])

In [35]: pd.to_datetime(df['ts']).astype('datetime64[us, US/Eastern]')
Out[35]: 
0   2016-03-10 23:20:00-05:00
Name: ts, dtype: datetime64[ns, US/Eastern]
gliptak

gliptak commented on Mar 13, 2016

@gliptak
Contributor

Thank you for the pointers.

In [4]: df = pd.DataFrame([parse("Mar 10, 2016 11:20 PM EDT")], columns=['ts'])
In [16]: df['ts'] = pd.to_datetime(df['ts']).astype('datetime64[us, US/Eastern]')
In [19]: df.dtypes['ts'] == np.dtype('datetime64[ns]')
Out[19]: False

So how am I to compare? Thanks

jreback

jreback commented on Mar 13, 2016

@jreback
Contributor

what are you trying to do? why do you need to compare? what are you comparing? most ops will simply work, you rarely actually need to compare things, if you need to sub-select use .select_dtypes(...) as I indicated.

gliptak

gliptak commented on Mar 13, 2016

@gliptak
Contributor

Sorry, I didn't offer context. I came across this working unit tests for pydata/pandas-datareader#188

dtypes = [np.dtype(x) for x in ['float64', 'float64', 'datetime64[ns]']]
tm.assert_series_equal(df.dtypes, pd.Series(dtypes, index=exp_columns))

I had to force no timezone for the compare above to succeed ...
Could you show how to rewrite df.dtypes['ts'] == np.dtype('datetime64[ns]') with .select_dtypes(...)?
Thanks

15 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @gliptak@jreback@jorisvandenbossche@sinhrks

      Issue actions

        BUG: Inconsistent datetime comparison with Tz · Issue #12601 · pandas-dev/pandas