Skip to content

BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column #47166

Not planned
@leodtprojectsd

Description

@leodtprojectsd

Pandas version checks

  • I have checked that this issue has not already been reported.

    I have confirmed this bug exists on the latest version of pandas.

    I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from pandas import Timestamp
import pandas as pd 

print ("pandas Version:",  pd.__version__)
#dataframe
df = pd.DataFrame.from_dict({'filename': ['03_', '03_', '03_', '05_', '05_', '05_', '05_', '05_', '08_', '08_'], 
 'date_time': [Timestamp('2022-05-24 12:10:56'), Timestamp('2022-05-24 12:11:24'), Timestamp('2022-05-24 12:11:51'), 
               Timestamp('2022-05-24 12:41:54'), Timestamp('2022-05-24 12:42:21'), Timestamp('2022-05-24 12:42:49'),
               Timestamp('2022-05-24 12:43:16'), Timestamp('2022-05-24 12:43:44'), Timestamp('2022-05-24 12:57:30'), 
               Timestamp('2022-05-24 12:57:58')],
  'r': [80466.36, 71467.12, 72641.21, 76961.35, 86747.23, 81995.81, 74451.46, 69401.51, 73670.12, 78180.65]})

print ("df column types: ", df.info(),)

print ('\nWorks with: df.groupby(["filename"]).agg(["mean"])\n', df.groupby(["filename"]).agg(["mean"]))
print ('\nNot working with: df.groupby(["filename"]).agg("mean")\n', df.groupby(["filename"]).agg("mean"))
print ('\nNot working with: df.groupby(["filename"]).mean()\n', df.groupby(["filename"]).mean())


OUT: 
pandas Version: 1.3.5
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   filename   10 non-null     object        
 1   date_time  10 non-null     datetime64[ns]
 2   r          10 non-null     float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 368.0+ bytes
df column types:  None

Works with: df.groupby(["filename"]).agg(["mean"]) #See date_time column appearing
                              date_time          r
                                  mean       mean
filename                                         
03_      2022-05-24 12:11:23.666666752  74858.230
05_      2022-05-24 12:42:48.800000000  77911.472
08_      2022-05-24 12:57:44.000000000  75925.385

Not working with: df.groupby(["filename"]).agg("mean") #date_time column is gone
                   r
filename           
03_       74858.230
05_       77911.472
08_       75925.385

Not working with: df.groupby(["filename"]).mean() #date_time column is gone
                   r
filename           
03_       74858.230
05_       77911.472
08_       75925.385

Issue Description

I expected the same behavior from

  • df.groupby(["filename"]).agg(["mean"])
  • df.groupby(["filename"]).agg("mean")
  • df.groupby(["filename"]).mean()

Instead, when used with a df that has a column with datetime64[ns] data, only .agg(["mean"]) works, while .agg("mean") and .mean() drop the datetime64[ns] column

Expected Behavior

I expect that agg(["mean"]), agg("mean"), and mean(), behave the same.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 66e3805 python : 3.7.13.final.0 python-bits : 64 OS : Linux OS-release : 5.4.188+ Version : #1 SMP Sun Apr 24 10:03:06 PDT 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.5
numpy : 1.21.6
pytz : 2022.1
dateutil : 2.8.2
pip : 21.1.3
setuptools : 57.4.0
Cython : 0.29.30
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.6
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 5.5.0
pandas_datareader: 0.9.0
bs4 : 4.6.3
bottleneck : 1.3.4
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.8.1
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.13.3
pyarrow : 6.0.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.4.36
tables : 3.7.0
tabulate : 0.8.9
xarray : 0.20.2
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2
None

Activity

changed the title [-]BUG: [/-] [+]BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column[/+] on May 30, 2022
guyrt

guyrt commented on May 31, 2022

@guyrt
Contributor

FWIW, in v1.2.5 none of these options operate on the datatime64[ns] column! @leodtprojectsd are you working on a PR for this bug? If not, I'd like to work on one.

rhshadrach

rhshadrach commented on May 31, 2022

@rhshadrach
Member

Thanks for the report! When using list or dict in agg, the DataFrame is broken up into Series before each function is applied. What you're seeing is the difference in numeric_only between DataFrame.groupby(...).mean and Series.groupby(...).mean. See:

https://pandas.pydata.org/pandas-docs/dev/user_guide/groupby.html#automatic-exclusion-of-nuisance-columns
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.mean.html

You can get the same result with

print(df.groupby("filename").agg("mean", numeric_only=False))

So I think that makes this a duplicate of #46560.

added
Nuisance ColumnsIdentifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on May 31, 2022
rhshadrach

rhshadrach commented on May 31, 2022

@rhshadrach
Member

I'm going to close this as a duplicate - @guyrt and @leodtprojectsd please reply here if you believe I've missed something and happy to reopen.

leodtprojectsd

leodtprojectsd commented on Jun 1, 2022

@leodtprojectsd
Author

@guyrt, wasn't working on it, but I think @rhshadrach reply covers it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDuplicate ReportDuplicate issue or pull requestGroupbyNuisance ColumnsIdentifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @guyrt@rhshadrach@leodtprojectsd

        Issue actions

          BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column · Issue #47166 · pandas-dev/pandas