Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from pandas import Timestamp
import pandas as pd
print ("pandas Version:", pd.__version__)
#dataframe
df = pd.DataFrame.from_dict({'filename': ['03_', '03_', '03_', '05_', '05_', '05_', '05_', '05_', '08_', '08_'],
'date_time': [Timestamp('2022-05-24 12:10:56'), Timestamp('2022-05-24 12:11:24'), Timestamp('2022-05-24 12:11:51'),
Timestamp('2022-05-24 12:41:54'), Timestamp('2022-05-24 12:42:21'), Timestamp('2022-05-24 12:42:49'),
Timestamp('2022-05-24 12:43:16'), Timestamp('2022-05-24 12:43:44'), Timestamp('2022-05-24 12:57:30'),
Timestamp('2022-05-24 12:57:58')],
'r': [80466.36, 71467.12, 72641.21, 76961.35, 86747.23, 81995.81, 74451.46, 69401.51, 73670.12, 78180.65]})
print ("df column types: ", df.info(),)
print ('\nWorks with: df.groupby(["filename"]).agg(["mean"])\n', df.groupby(["filename"]).agg(["mean"]))
print ('\nNot working with: df.groupby(["filename"]).agg("mean")\n', df.groupby(["filename"]).agg("mean"))
print ('\nNot working with: df.groupby(["filename"]).mean()\n', df.groupby(["filename"]).mean())
OUT:
pandas Version: 1.3.5
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 filename 10 non-null object
1 date_time 10 non-null datetime64[ns]
2 r 10 non-null float64
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 368.0+ bytes
df column types: None
Works with: df.groupby(["filename"]).agg(["mean"]) #See date_time column appearing
date_time r
mean mean
filename
03_ 2022-05-24 12:11:23.666666752 74858.230
05_ 2022-05-24 12:42:48.800000000 77911.472
08_ 2022-05-24 12:57:44.000000000 75925.385
Not working with: df.groupby(["filename"]).agg("mean") #date_time column is gone
r
filename
03_ 74858.230
05_ 77911.472
08_ 75925.385
Not working with: df.groupby(["filename"]).mean() #date_time column is gone
r
filename
03_ 74858.230
05_ 77911.472
08_ 75925.385
Issue Description
I expected the same behavior from
- df.groupby(["filename"]).agg(["mean"])
- df.groupby(["filename"]).agg("mean")
- df.groupby(["filename"]).mean()
Instead, when used with a df that has a column with datetime64[ns] data, only .agg(["mean"]) works, while .agg("mean") and .mean() drop the datetime64[ns] column
Expected Behavior
I expect that agg(["mean"]), agg("mean"), and mean(), behave the same.
Installed Versions
pandas : 1.3.5
numpy : 1.21.6
pytz : 2022.1
dateutil : 2.8.2
pip : 21.1.3
setuptools : 57.4.0
Cython : 0.29.30
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.6
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 5.5.0
pandas_datareader: 0.9.0
bs4 : 4.6.3
bottleneck : 1.3.4
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.8.1
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.13.3
pyarrow : 6.0.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.4.36
tables : 3.7.0
tabulate : 0.8.9
xarray : 0.20.2
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.51.2
None
Activity
[-]BUG: [/-][+]BUG: Different behavior from .agg("mean") and .agg(["mean"]) on a grouby df with a datetime64[ns] column[/+]guyrt commentedon May 31, 2022
FWIW, in v1.2.5 none of these options operate on the datatime64[ns] column! @leodtprojectsd are you working on a PR for this bug? If not, I'd like to work on one.
rhshadrach commentedon May 31, 2022
Thanks for the report! When using list or dict in
agg
, the DataFrame is broken up into Series before each function is applied. What you're seeing is the difference innumeric_only
betweenDataFrame.groupby(...).mean
andSeries.groupby(...).mean
. See:https://pandas.pydata.org/pandas-docs/dev/user_guide/groupby.html#automatic-exclusion-of-nuisance-columns
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.mean.html
You can get the same result with
So I think that makes this a duplicate of #46560.
rhshadrach commentedon May 31, 2022
I'm going to close this as a duplicate - @guyrt and @leodtprojectsd please reply here if you believe I've missed something and happy to reopen.
leodtprojectsd commentedon Jun 1, 2022
@guyrt, wasn't working on it, but I think @rhshadrach reply covers it, thanks!