API: make CategoricalIndex._concat consistent with pd.concat

`Index._concat` (used by `Index.append`) is thin wrapper around `concat_compat`.  It is overriden by CategoricalIndex so that CategoricalDtype is retained more often than it is in `concat_compat`.  We should make these match.

If we just rip `CategoricalIndex._concat`, we break 6 tests, all of which boil down to:

```
    def test_append_category_objects(self, ci):
        # with objects
        result = ci.append(Index(["c", "a"]))
        expected = CategoricalIndex(list("aabbcaca"), categories=ci.categories)
>       tm.assert_index_equal(result, expected, exact=True)
```

If we go the other way and change `concat_compat`, we break 6 different tests, all of which involve all-empty arrays or arrays that can be losslessly cast to the Categorical's dtype, e.g (edited for legibility)

```
    def test_concat_empty_series_dtype_category_with_array(self):
        # GH#18515
        left = Series(np.array([]), dtype="category")
        right = Series(dtype="float64")
        result = concat([left, right])
>        assert result.dtype == "float64"


    def test_concat_categorical_coercion(self):
        # GH 13524
    
        # category + not-category => not-category
        s1 = Series([1, 2, np.nan], dtype="category")
        s2 = Series([2, 1, 2])
    
        exp = Series([1, 2, np.nan, 2, 1, 2], dtype="object")
>       tm.assert_series_equal(pd.concat([s1, s2], ignore_index=True), exp)
E       AssertionError: Attributes of Series are different
E       
E       Attribute "dtype" are different
E       [left]:  CategoricalDtype(categories=[1, 2], ordered=False)
E       [right]: object
```

Changing concat_compat results in much more convenient behavior, but it is textbook "values-dependent behavior" that in general we want to avoid (cc @jorisvandenbossche)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

API: make CategoricalIndex._concat consistent with pd.concat #41626

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

API: make CategoricalIndex._concat consistent with pd.concat #41626

Description

Activity

TomAugspurger commented on Jun 2, 2021

jbrockmendel commented on Jun 3, 2021

jreback commented on Jun 24, 2021

jbrockmendel commented on Jun 26, 2021

jorisvandenbossche commented on Jul 11, 2021

jbrockmendel commented on Apr 15, 2022

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions