Skip to content

transform behaves differently with 'ffill' on DataFrameGroupBy and SeriesGroupBy #24211

@timgeb

Description

@timgeb

Code Sample

import pandas as pd
data = [['a', 0.0], ['a', float('nan')], ['b', 1.0], ['b', float('nan')]]
df = pd.DataFrame(data, columns=['key', 'values'])
print(df.groupby('key').transform('ffill'))
print(df.groupby('key')['values'].transform('ffill'))

Problem description

The first print statement produces

   values
0     0.0
1     0.0
2     1.0
3     1.0

The second print statement produces

0    0.0
1    0.0
2    0.0
3    0.0

Expected Output

I expected both operations to compute the same values. I regard the first output as the correct one.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-139-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: None
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.7.0

Activity

mroeschke

mroeschke commented on Dec 10, 2018

@mroeschke
Member

Thanks for the report. Investigation and PR's welcome!

WillAyd

WillAyd commented on Dec 11, 2018

@WillAyd
Member

Somewhat orthogonal but using the ffill method directly yields the desired result and would be more performant. Agreed though sending it through transform needs to be fixed

added this to the Contributions Welcome milestone on Dec 11, 2018
luisneto98

luisneto98 commented on Feb 16, 2019

@luisneto98

Hello, I'm new here. Should I indicate that I want to try to solve this problem? Can I solve it?

arw2019

arw2019 commented on Sep 20, 2020

@arw2019
Member

This is fixed on 1.2 master. Running the OP I get:

In [3]: df.groupby('key').transform('ffill')                                                                                                                                                                      
Out[3]: 
   values
0     0.0
1     0.0
2     1.0
3     1.0

In [4]: df.groupby('key')['values'].transform('ffill')                                                                                                                                                            
Out[4]: 
0    0.0
1    0.0
2    1.0
3    1.0
Name: values, dtype: float64
Output of pd.show_versions()

INSTALLED VERSIONS

commit : a22cf43
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-47-generic
Version : #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+446.ga22cf439e
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

added
Needs TestsUnit test(s) needed to prevent regressions
and removed
ApplyApply, Aggregate, Transform, Map
on Sep 20, 2020
modified the milestones: Contributions Welcome, 1.3 on Feb 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    GroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Participants

      @WillAyd@jreback@timgeb@jbrockmendel@mroeschke

      Issue actions

        transform behaves differently with 'ffill' on DataFrameGroupBy and SeriesGroupBy · Issue #24211 · pandas-dev/pandas