Skip to content

API: DatetimeIndex union vs join different behavior with mismatched tzs #39328

@jbrockmendel

Description

@jbrockmendel
Member

For union when we have mismatched tzs or mismatched tzawareness, we cast to object.

For join when we have mismatched tzawareness we raise TypeError, with mismatched tzs we cast to UTC.

It would be convenient if these had the same behavior.

Activity

jorisvandenbossche

jorisvandenbossche commented on Jan 22, 2021

@jorisvandenbossche
Member

This is a bit related to #37605 ? If we decide there (for setitem) that matching tzawareness (but not tz-equality) is enough, then I would expect also both union and join to preserve a tzaware datetime dtype (which means casting to UTC, I suppose).

jbrockmendel

jbrockmendel commented on Jan 23, 2021

@jbrockmendel
MemberAuthor

One way to implement this would be to make find_common_type for mismatched dt64tzs return UTC. Doing this would change the setop behavior in a pretty reasonable way. the sticking point is in a couple of groupby tests:

    def test_transform_lambda_with_datetimetz():
        # GH 27496
        df = DataFrame(
            {
                "time": [
                    Timestamp("2010-07-15 03:14:45"),
                    Timestamp("2010-11-19 18:47:06"),
                ],
                "timezone": ["Etc/GMT+4", "US/Eastern"],
            }
        )
        result = df.groupby(["timezone"])["time"].transform(
            lambda x: x.dt.tz_localize(x.name)
        )
        expected = Series(
            [
                Timestamp("2010-07-15 03:14:45", tz="Etc/GMT+4"),
                Timestamp("2010-11-19 18:47:06", tz="US/Eastern"),
            ],
            name="time",
        )
>       tm.assert_series_equal(result, expected)
E       AssertionError: Attributes of Series are different
E       
E       Attribute "dtype" are different
E       [left]:  datetime64[ns, UTC]
E       [right]: object

    def test_groupby_multi_timezone(self):
    
        # combining multiple / different timezones yields UTC
    
        dates = [
            "2000-01-28 16:47:00",
            "2000-01-29 16:48:00",
            "2000-01-30 16:49:00",
            "2000-01-31 16:50:00",
            "2000-01-01 16:50:00",
        ]
        tzs = [
            "America/Chicago",
            "America/Chicago",
            "America/Los_Angeles",
            "America/Chicago",
            "America/New_York",
        ]
        df = DataFrame({"value": range(5), "date": dates, "tz": tzs})
    
        result = df.groupby("tz").date.apply(
            lambda x: pd.to_datetime(x).dt.tz_localize(x.name)
        )
    
        expected = Series(
            [
                Timestamp("2000-01-28 16:47:00-0600", tz="America/Chicago"),
                Timestamp("2000-01-29 16:48:00-0600", tz="America/Chicago"),
                Timestamp("2000-01-30 16:49:00-0800", tz="America/Los_Angeles"),
                Timestamp("2000-01-31 16:50:00-0600", tz="America/Chicago"),
                Timestamp("2000-01-01 16:50:00-0500", tz="America/New_York"),
            ],
            name="date",
            dtype=object,
        )
>       tm.assert_series_equal(result, expected)
E       AssertionError: Attributes of Series are different
E       
E       Attribute "dtype" are different
E       [left]:  datetime64[ns, UTC]
E       [right]: object

In both of these examples, its pretty reasonable to think the user would want the object-dtype result

added
ReshapingConcat, Merge/Join, Stack/Unstack, Explode
DatetimeDatetime data dtype
TimezonesTimezone data dtype
API - ConsistencyInternal Consistency of API/Behavior
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Jan 26, 2021
jorisvandenbossche

jorisvandenbossche commented on Jan 29, 2021

@jorisvandenbossche
Member

I think for those two cases you bring up, it makes sense to have UTC tz-aware as output (AFAIU that's also what the original reporter wanted in #27496)

added this to the 1.3 milestone on May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorDatetimeDatetime data dtypeReshapingConcat, Merge/Join, Stack/Unstack, ExplodeTimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@jorisvandenbossche@jbrockmendel@simonjayhawkins

      Issue actions

        API: DatetimeIndex union vs join different behavior with mismatched tzs · Issue #39328 · pandas-dev/pandas