-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
#!/bin/env python
"""
Example bug in derived Pandas Series.
__finalized__ is not called in arithmetic binary operators, but it is in in some booleans cases.
>>> m = MySeries([1, 2, 3], name='test')
>>> m.x = 42
>>> n=m[:2]
>>> n
0 1
1 2
dtype: int64
>>> n.x
42
>>> o=n+1
>>> o
0 2
1 3
dtype: int64
>>> o.x
Traceback (most recent call last):
...
AttributeError: 'MySeries' object has no attribute 'x'
>>> m = MySeries([True, False, True], name='test2')
>>> m.x = 42
>>> n=m[:2]
>>> n
0 True
1 False
dtype: bool
>>> n.x
42
>>> o=n ^ True
>>> o
0 False
1 True
dtype: bool
>>> o.x
42
>>> p = n ^ o
>>> p
0 True
1 True
dtype: bool
>>> p.x
42
"""
import pandas as pd
class MySeries(pd.Series):
_metadata = ['x']
@property
def _constructor(self):
return MySeries
if __name__ == "__main__":
import doctest
doctest.testmod()
Expected Output
In all cases, the metadata 'x' should be transferred from the passed values when applying binary operators.
When the right-hand value is a constant, the left-hand value metadata should be used in finalize for arithmetic operators, just like it is for Boolean binary operators.
When two series are used in binary operators, some resolution should be possible in finalize.
I would pass the second (right-hand) value by calling finalize(self, other=other), leaving the resolution to the derived class implementer, but there might be a smarter approach.
output of pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: None
setuptools: 20.2.2
Cython: 0.24
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None
Activity
jreback commentedon May 17, 2016
yeah it appears some paths don't call finalize; though you can't just call finalize for example on binary operations between 2 Series. This would have to be handled in the sub-class itself. We don't have the machinery to ignore this though.
So I am suggesting that we DO call
__finalize__
but maybe pass a parameter that will be default propogate (but on binary functions this parameter can be set toFalse
and prevent propogation / unless overriden)jreback commentedon May 17, 2016
pull-requests welcome!
pfrcks commentedon May 18, 2016
@jreback I would like to attempt it. Based on what you have written, I have located the function def of finalize in generic.py.
I'm thinking of adding a paramter with default value as True just as you mentioned above( which can be set to False on binary functions.)
However I'm not able to understand what do you mean by "DO call finalize". Does that mean that something has to be changed in the process during operations between two Series?
jdfekete commentedon May 18, 2016
@pfrcks I think it means you need to change pandas/core/ops.py so that binary operations now call your brand new finalize
Each call to self._constructor should be followed by a call to finalize to make sure the metadata is propagated.
Also, adding a named argument to finalize will break upward compatibility, but that might be the only way to do it right.
pfrcks commentedon May 18, 2016
@jdfekete That makes more sense. Thanks. Will look into it.
jreback commentedon May 18, 2016
so to clarify
x
andy
areNDFrames
(e.g.Series/DataFrame
)x + 1
->__finalize__(other)
x + y
->__finalize__(other, method = '__add__')
would prob work. So the result in both cases is a
NDFrame
, they can be distinguished (and possibly operated on but a sub-class) whenmethod
isnot None
jdfekete commentedon May 18, 2016
I am not sure. Looking in ops.py on the
wrapper
methods, maybe:x + 1
->__finalize__(self)
x + y
->__finalize__(self, method='__add__', other=other)
So far, the operators are not passed to the
__finalize__
function. It could be handy in general, but that's a separate issue I think. Addingmethod=operator
is useful but does not brake the API (I think). Adding theother=other
parameter do change it a bit, although the current__finalize__
method will take it in its**kwargs
.Also, I have seen references to possible cycles/loops in some exchanges but I don't quite understand when they happen and whether my solution solves them.
jreback commentedon May 18, 2016
that's exactly what needs to be one,
__finalize__
NEEDS to be called. There is no API change.matthiasha commentedon Sep 1, 2017
Same issue, different context
Code Sample
Problem description
The documentation states:
Define _metadata for normal properties which will be passed to manipulation results
See http://pandas.pydata.org/pandas-docs/stable/internals.html#define-original-properties.
I think multiplication / adding etc. are also manipulation results.
This should be discussed with others who are already subclassing Pandas classes and are making use of _metadata.
Expected Output
The property is expected to be passed on to the calculation result.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.2.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
7 remaining items
jbrockmendel commentedon Oct 21, 2019
This is fixed in master, may need a dedicated test