Skip to content

Series __finalized__ not correctly called in binary operators #13208

@jdfekete

Description

@jdfekete
#!/bin/env python
"""
Example bug in derived Pandas Series.

__finalized__ is not called in arithmetic binary operators, but it is in in some booleans cases.

>>> m = MySeries([1, 2, 3], name='test')
>>> m.x = 42
>>> n=m[:2]
>>> n
0    1
1    2
dtype: int64
>>> n.x
42
>>> o=n+1
>>> o
0    2
1    3
dtype: int64
>>> o.x
Traceback (most recent call last):
        ...
AttributeError: 'MySeries' object has no attribute 'x'
>>> m = MySeries([True, False, True], name='test2')
>>> m.x = 42
>>> n=m[:2]
>>> n
0     True
1    False
dtype: bool
>>> n.x
42
>>> o=n ^ True
>>> o
0    False
1     True
dtype: bool
>>> o.x
42
>>> p = n ^ o
>>> p
0    True
1    True
dtype: bool
>>> p.x
42

"""

import pandas as pd

class MySeries(pd.Series):
    _metadata = ['x']

    @property
    def _constructor(self):
        return MySeries

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Expected Output

In all cases, the metadata 'x' should be transferred from the passed values when applying binary operators.
When the right-hand value is a constant, the left-hand value metadata should be used in finalize for arithmetic operators, just like it is for Boolean binary operators.
When two series are used in binary operators, some resolution should be possible in finalize.
I would pass the second (right-hand) value by calling finalize(self, other=other), leaving the resolution to the derived class implementer, but there might be a smarter approach.

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: None
setuptools: 20.2.2
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

Activity

jreback

jreback commented on May 17, 2016

@jreback
Contributor

yeah it appears some paths don't call finalize; though you can't just call finalize for example on binary operations between 2 Series. This would have to be handled in the sub-class itself. We don't have the machinery to ignore this though.

So I am suggesting that we DO call __finalize__ but maybe pass a parameter that will be default propogate (but on binary functions this parameter can be set to False and prevent propogation / unless overriden)

added this to the Next Major Release milestone on May 17, 2016
jreback

jreback commented on May 17, 2016

@jreback
Contributor

pull-requests welcome!

pfrcks

pfrcks commented on May 18, 2016

@pfrcks
Contributor

@jreback I would like to attempt it. Based on what you have written, I have located the function def of finalize in generic.py.
I'm thinking of adding a paramter with default value as True just as you mentioned above( which can be set to False on binary functions.)
However I'm not able to understand what do you mean by "DO call finalize". Does that mean that something has to be changed in the process during operations between two Series?

jdfekete

jdfekete commented on May 18, 2016

@jdfekete
Author

@pfrcks I think it means you need to change pandas/core/ops.py so that binary operations now call your brand new finalize
Each call to self._constructor should be followed by a call to finalize to make sure the metadata is propagated.
Also, adding a named argument to finalize will break upward compatibility, but that might be the only way to do it right.

pfrcks

pfrcks commented on May 18, 2016

@pfrcks
Contributor

@jdfekete That makes more sense. Thanks. Will look into it.

jreback

jreback commented on May 18, 2016

@jreback
Contributor

so to clarify x and y are NDFrames (e.g. Series/DataFrame)

x + 1 -> __finalize__(other)
x + y -> __finalize__(other, method = '__add__')

would prob work. So the result in both cases is a NDFrame, they can be distinguished (and possibly operated on but a sub-class) when method is not None

jdfekete

jdfekete commented on May 18, 2016

@jdfekete
Author

I am not sure. Looking in ops.py on the wrapper methods, maybe:
x + 1 -> __finalize__(self)
x + y -> __finalize__(self, method='__add__', other=other)

So far, the operators are not passed to the __finalize__ function. It could be handy in general, but that's a separate issue I think. Adding method=operator is useful but does not brake the API (I think). Adding the other=other parameter do change it a bit, although the current __finalize__ method will take it in its **kwargs .
Also, I have seen references to possible cycles/loops in some exchanges but I don't quite understand when they happen and whether my solution solves them.

jreback

jreback commented on May 18, 2016

@jreback
Contributor

So far, the operators are not passed to the finalize function. It could be handy in general, but that's a separate issue I think. Adding method=operator is useful but does not brake the API (I think). Adding the other=other parameter do change it a bit, although the current finalize method will take it in its **kwargs .

that's exactly what needs to be one, __finalize__ NEEDS to be called. There is no API change.

matthiasha

matthiasha commented on Sep 1, 2017

@matthiasha

Same issue, different context

Code Sample

import pandas

# subclass series and define property 'meta' to be passed on
class S(pandas.Series):
    _metadata = ['meta']
    
    @property
    def _constructor(self):
        return S
    
# create new instance of Series subclass and set custom property
x = S(range(3))
x.meta = 'test'

# 'meta' gets passed on to slice, as expected
print(x[1:].meta)

# calculation results 
print((x * 2).meta)

Problem description

The documentation states:
Define _metadata for normal properties which will be passed to manipulation results

See http://pandas.pydata.org/pandas-docs/stable/internals.html#define-original-properties.

I think multiplication / adding etc. are also manipulation results.

This should be discussed with others who are already subclassing Pandas classes and are making use of _metadata.

Expected Output

The property is expected to be passed on to the calculation result.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.2.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

7 remaining items

jbrockmendel

jbrockmendel commented on Oct 21, 2019

@jbrockmendel
Member

This is fixed in master, may need a dedicated test

modified the milestones: Contributions Welcome, 1.0 on Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@jorisvandenbossche@TomAugspurger@jdfekete@matthiasha

      Issue actions

        Series __finalized__ not correctly called in binary operators · Issue #13208 · pandas-dev/pandas