Skip to content

API: inconsistency in casting back to its EA dtype (try_cast_to_ea) #31108

@jorisvandenbossche

Description

@jorisvandenbossche
Member

With the current logic we have in try_cast_to_ea:

try:
result = cls_or_instance._from_sequence(obj, dtype=dtype)
except Exception:
# We can't predict what downstream EA constructors may raise
result = obj
return result

it's easy to get inconsistencies, and it depends critically on what _from_sequence accepts as valid scalar.

For example, we now have this

>>> s = pd.Series([0, 1, 2], dtype="Int64") 
>>> s.combine(0, lambda x, y: x == y)      
0     True
1    False
2    False
dtype: bool

However, if the IntegerArray constructor gets changed slightly to be more willing to accept all kinds of boolean like values (see #31104 for a current inconsistency on this aspect, and this is what happens in #30282), you can get:

>>> s = pd.Series([0, 1, 2], dtype="Int64")  
>>> s.combine(0, lambda x, y: x == y)       
0    1
1    0
2    0
dtype: Int64

So how "forgiving" the EA constructor is, will determine the output type.

Here the example is with combine, but the try_cast_to_ea function is also used in a part of groupby.

Activity

jorisvandenbossche

jorisvandenbossche commented on Jan 17, 2020

@jorisvandenbossche
MemberAuthor

Another example (hypothetical, but it still illustrates the problem I think). If we would not have had the and dtype.kind != 'M' check before calling try_cast_to_ea at

if is_extension_array_dtype(dtype) and dtype.kind != "M":
), we would see the following behaviour:

>>> df = pd.DataFrame({'key': ['a', 'b', 'a', 'b'], 
...                    'val': pd.date_range("2012-01-01", periods=4, tz="UTC")})

>>> df.groupby('key')['val'].agg(lambda x: x.mean().year)  # agg func that returns int
key
a   1970-01-01 00:00:00.000002012+00:00
b   1970-01-01 00:00:00.000002012+00:00
Name: val, dtype: datetime64[ns, UTC]

because we allow to create datetime64 array from integers. Here we guard for that explicitly, but that seems a bit code smell, and can also happen for other EA types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorBugExtensionArrayExtending pandas with custom dtypes or arrays.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @jorisvandenbossche@mroeschke

      Issue actions

        API: inconsistency in casting back to its EA dtype (try_cast_to_ea) · Issue #31108 · pandas-dev/pandas