Skip to content

Sparse data frame doesn't groupby.mean() correctly #5078

Closed
@langmore

Description

@langmore
>>>import pandas as pd
>>> pd.__version__
 '0.12.0-660-gec77315'

>>> import numpy as np
>>> np.version.version
'1.7.1'

>>> df = pd.DataFrame({'a': [0, 1, 0, 0], 'b': [0, 1, 0, 0]})
>>> sdf = df.to_sparse(fill_value=0)
>>> df.groupby('a').mean() 
    b
a   
0  0
1  1

>>> sdf.groupby('a').mean() 
    b
a   
1  0

I'm not surpised that the mean for group a == 0 was not returned. It is surprising that the result for group a == 1 was incorrect.

Activity

modified the milestones: 0.15.0, 0.14.0 on Feb 15, 2014
modified the milestones: 0.16.0, Next Major Release on Mar 1, 2015
OmerJog

OmerJog commented on Nov 28, 2018

@OmerJog

Are there any planes to fix this any time soon?

TomAugspurger

TomAugspurger commented on Sep 17, 2019

@TomAugspurger
Contributor

@OmerJog this works with a DataFrame with sparse values

In [90]: df = pd.DataFrame({'a': [0, 1, 0, 0], 'b': [0, 1, 0, 0]})

In [91]: df.apply(pd.SparseArray).groupby('a').mean()
Out[91]:
   b
a
0  0
1  1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @langmore@jreback@TomAugspurger@OmerJog

      Issue actions

        Sparse data frame doesn't groupby.mean() correctly · Issue #5078 · pandas-dev/pandas