Skip to content

Groupby aggregations could ignore non-numeric columns when axis=1 #3688

@hayd

Description

@hayd
Contributor

Perhaps the following groupby aggregation should work only the numeric columns, as they would when using the dataframe:

In [1]: df = pd.DataFrame({'bar': {0: 1, 1: 1, 2: 1}, 'foo': {0: 0, 1: 1, 2: 2}, 'foo1': {0: 1, 1: 2, 2: 3}, 'hello': {0: 'a', 1: 'a', 2: 'a'}}, columns=['bar', 'foo', 'foo', 'hello'])

In [2]: df
Out[2]:
   bar  foo  foo hello
0    1    0    1     a
1    1    1    2     a
2    1    2    3     a

In [3]: df.mean()  # hello is ignored
Out[13]:
bar    1
foo    1
foo    2
dtype: float64

In [4]: df.groupby(level=0, axis=1).mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-4-7c2612a8fbda> in <module>()
----> 1 df.groupby(level=0, axis=1).mean()

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
    351         """
    352         try:
--> 353             return self._cython_agg_general('mean')
    354         except GroupByError:
    355             raise

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
   1569
   1570     def _cython_agg_general(self, how, numeric_only=True):
-> 1571         new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   1572         return self._wrap_agged_blocks(new_blocks)
   1573

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
   1616
   1617         if len(new_blocks) == 0:
-> 1618             raise DataError('No numeric types to aggregate')
   1619
   1620         return new_blocks

DataError: No numeric types to aggregate

From this SO question, where I gave very hacky workaround.

cc #3683 @jreback was this the question you were talking about? This ones related but in the sense of coming up against non unique problems... Thought I should mention it here anyway.

Activity

jreback

jreback commented on May 23, 2013

@jreback
Contributor

this is the question, but still breaks (even on my new branch).... will look at this soon

jreback

jreback commented on May 23, 2013

@jreback
Contributor

linking to #3679

modified the milestones: 0.15.0, 0.14.0 on Mar 11, 2014
keir

keir commented on Apr 29, 2014

@keir

Any update on this @jreback? Did your branch ever go anywhere?

jreback

jreback commented on Apr 29, 2014

@jreback
Contributor

you are welcome to do a pr if u would like

modified the milestones: 0.16.0, Next Major Release on Mar 3, 2015
WillAyd

WillAyd commented on Jul 6, 2018

@WillAyd
Member

Still a problem, though note that this only fails due to axis=1

changed the title [-]Groupby aggregations could ignore non-numeric columns[/-] [+]Groupby aggregations could ignore non-numeric columns when axis=1[/+] on Jul 6, 2018
modified the milestone: Contributions Welcome on Jul 8, 2018
added
Nuisance ColumnsIdentifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
on Oct 29, 2021
NumberPiOso

NumberPiOso commented on Feb 13, 2022

@NumberPiOso
Contributor

Nowadays running the example results in a FutureWarning at both operations.

In [3]: df.mean()
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.


In [4]:  df.groupby(level=0, axis=1).mean() 
FutureWarning: Dropping invalid columns in DataFrameGroupBy.mean is deprecated. In a future version, a TypeError will be raised. Before calling .mean, select only columns which should be valid for the function.
  df.groupby(level=0, axis=1).mean()
Out[4]: 
Empty DataFrame
Columns: [bar, foo, hello]
Index: []

This change was introduced by PR #41480. So, nowadays groupby aggregations should NOT ignore non-numeric columns. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyNuisance ColumnsIdentifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.applyNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @keir@WillAyd@jreback@hayd@jbrockmendel

        Issue actions

          Groupby aggregations could ignore non-numeric columns when axis=1 · Issue #3688 · pandas-dev/pandas