Skip to content

BUG: groupby doesn't identify null values when sort=False #48506

@rhshadrach

Description

@rhshadrach
Member

Closely related to #48476

As far as I can tell this only occurs when the input dtype to groupby is object.

df = pd.DataFrame({'a': [np.nan, pd.NA, None], 'b': [1, 2, 3]})
gb = df.groupby('a', sort=True, dropna=False)
print(gb.sum())

#      b
# a     
# NaN  6

but with sort=False:

df = pd.DataFrame({'a': [np.nan, pd.NA, None], 'b': [1, 2, 3]})
gb = df.groupby('a', sort=False, dropna=False)
print(gb.sum())

#       b
# a      
# NaN   1
# <NA>  2
# None  3

I think we should prefer the sort=True behavior as that is the default value for now, but prefer sort=False in the long run.

Activity

added
Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff
Needs DiscussionRequires discussion from core team before further action
RegressionFunctionality that used to work in a prior pandas version
and removed
Needs DiscussionRequires discussion from core team before further action
on Sep 11, 2022
rhshadrach

rhshadrach commented on Sep 11, 2022

@rhshadrach
MemberAuthor

This is a regression that occurred in #46601, cc @rhshadrach. Prior to this, the sort=False behavior agreed with sort=True in collapsing different types of null values.

added this to the 1.5 milestone on Sep 11, 2022
phofl

phofl commented on Sep 11, 2022

@phofl
Member

Added 1.5 milestone

self-assigned this
on Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffBugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateRegressionFunctionality that used to work in a prior pandas version

Type

No type

Projects

No projects

Relationships

None yet

    Development

    Participants

    @rhshadrach@phofl

    Issue actions

      BUG: groupby doesn't identify null values when sort=False · Issue #48506 · pandas-dev/pandas