Skip to content

Conversation

@rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Bug Groupby Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Nov 17, 2023
@pytest.mark.parametrize("normalize", [True, False])
def test_value_counts_sort(sort, vc_sort, normalize):
# GH#55951
df = DataFrame({"a": [2, 1, 1, 1], 0: [3, 4, 3, 3]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also tests with Categorical?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - good call. I added a test and found a bug calling .sortlevel() in the categorical section. But the categorical behavior is still buggy here; namely

multi_index, _ = MultiIndex.from_product(
levels_list, names=[ping.name for ping in groupings]
).sortlevel()

is constructing the index by taking the cartesian product across all groupings; it should only be doing this for the categorical variables. That's not an easy fix, but will be handled in #55738 where we can just remove that whole block of code.

@mroeschke mroeschke added this to the 2.2 milestone Nov 17, 2023
@mroeschke mroeschke merged commit dbf8aaf into pandas-dev:main Nov 17, 2023
@mroeschke
Copy link
Member

Thanks @rhshadrach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Groupby

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DEPR: DataFrame.value_counts(sort=False) sorts

2 participants