Description
xref some of the conversation in #20024 right now the following is possible
In [16]: df = pd.DataFrame([(0, 1), (0, 2), (1, 3), (1, 4)], columns=['key', 'val'])
In [18]: df - df.mean()
Out[18]:
key val
0 -0.5 -1.5
1 -0.5 -0.5
2 0.5 0.5
3 0.5 1.5
In [19]: df['val'] - df['val'].mean()
Out[19]:
0 -1.5
1 -0.5
2 0.5
3 1.5
Name: val, dtype: float64
But trying to do something similar with grouped data does not work:
In [20]: df.groupby('key') - df.groupby('key').mean()
...
ValueError: Unable to coerce to Series, length must be 1: given 2
I am proposing that we update the GroupBy
class to allow numerical operations with the result of aggregations or transformations against that object. Note that this is possible today through a much more verbose and hackish:
In [23]: df.groupby('key').shift(0) - df.groupby('key').transform('mean')
Out[23]:
val
0 -0.5
1 0.5
2 -0.5
3 0.5
The Series
/ DataFrame
operations are all added via add_special_arithmetic_methods
with their implementations being defined in ops.py
. We could leverage a similar mechanism for GroupBy
Why is this worth doing?
- Consistent arithmetic ops for
Series
,DataFrame
andGroupBy
objects - May enable deprecation of methods like
mad
(see Cythonized GroupBy mad #20024) - Provides easier "demeaning" and "normalization" for grouped data
- Mirrors xarray implementation which appears well received by user base
Why may it not be worth doing?
- Will add more complexity to a
GroupBy
class that is already in need of refactor - TBD
Consideration Points
With this proposal, the left operand would always be a GroupBy
object and the right operand would always be a the result of a function application against that same GroupBy
. The result of the operation should be a Series
or DataFrame
like-indexed to the original object.
That said, the following operations would in theory be identical:
df.groupby('key') - df.groupby('key').mean()
# OR
df.groupby('key') - df.groupby('key').transform('mean')
I'm not sure if we care to differentiate between these and force users into choosing one or the other.
Thoughts?