Bug when combining .groupby() apply with .expanding() apply

In this example, the aim is to use an expanding window to create an expanding count, by group, of the occurrence of a predetermined set of strings. Seemed like there might be some sort of bug in the performance of `expanding` when combined with `groupby` and `apply`.

In this case the strings are `['tito',  'bar', 'feep']`

```
              category      group
2000-01-01  'foo'            a
2000-01-02  'tito'           a
2000-01-03  'bar'            a
2000-01-04  'zip'            b
2000-01-05  'zorp'           b
2000-01-03  'feep'           c
```

So this would become:

```
              category      group    count
2000-01-01  'foo'            a            0
2000-01-02  'tito'           a            1
2000-01-03  'bar'            a            2
2000-01-04  'zip'            b            0
2000-01-05  'zorp'           b            0
2000-01-03  'feep'           c            1
```

However, when I run the following code, it's just the `category` column that gets returned as `count`. The same thing happens when I use `window` in the place of `expanding`.

``` python
from operator import or_

df = pd.DataFrame({'category':['foo', 'tito', 'bar', 'zip', 'zorp', 'feep'],                   
                                    'group': ['a', 'a', 'a', 'b', 'b', 'c']},
                                    index=pd.to_datetime(['2000-01-01', '2000-01-02', '2000-01-03',
                                                                          '2000-01-04', '2000-01-05', '2000-01-03']))

def count_categories(ser):

    categories_to_count = ['tito',
                           'bar',
                           'feep']

    conditions = [ser == val for val in categories_to_count]
    mask = reduce(or_, conditions)
    return mask.sum()


def expanding_count_categories(s):
    return s.expanding().apply(count_categories)

df.groupby('group')['category'].apply(expanding_count_categories)

>> '2000-01-01'     foo
>> '2000-01-02'    tito
>> '2000-01-03'     bar
>> '2000-01-04'     zip
>> '2000-01-05'    zorp
>> '2000-01-03'    feep
>> dtype: object
```
## INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-76-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.1
pip: 8.0.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.16.1
statsmodels: 0.6.1
xarray: None
IPython: 4.0.2
sphinx: 1.2.2
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.3
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

Bug when combining .groupby() apply with .expanding() apply #12829

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Bug when combining .groupby() apply with .expanding() apply #12829

Description

INSTALLED VERSIONS

Activity

jreback commented on Apr 8, 2016

lminer commented on Apr 8, 2016

jreback commented on Apr 8, 2016

lminer commented on Apr 8, 2016

jreback commented on Apr 8, 2016

lminer commented on Apr 8, 2016

jreback commented on Apr 8, 2016

jreback commented on Apr 8, 2016

lminer commented on Apr 8, 2016

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions