ENH: Add Support for GroupBy Numeric Operations

xref some of the conversation in #20024 right now the following is possible

```python
In [16]: df = pd.DataFrame([(0, 1), (0, 2), (1, 3), (1, 4)], columns=['key', 'val'])
In [18]: df - df.mean()
Out[18]: 
   key  val
0 -0.5 -1.5
1 -0.5 -0.5
2  0.5  0.5
3  0.5  1.5


In [19]: df['val'] - df['val'].mean()
Out[19]: 
0   -1.5
1   -0.5
2    0.5
3    1.5
Name: val, dtype: float64
```

But trying to do something similar with grouped data does not work:

```python
In [20]: df.groupby('key') - df.groupby('key').mean()
        ...
ValueError: Unable to coerce to Series, length must be 1: given 2
```

I am proposing that we update the `GroupBy` class to allow numerical operations with the result of aggregations or transformations against that object. Note that this is possible today through a much more verbose and hackish:

```python
In [23]: df.groupby('key').shift(0) - df.groupby('key').transform('mean')
Out[23]: 
   val
0 -0.5
1  0.5
2 -0.5
3  0.5
```

The `Series` / `DataFrame` operations are all added via `add_special_arithmetic_methods` with their implementations being defined in `ops.py`. We could leverage a similar mechanism for `GroupBy`

**Why is this worth doing?**
1. Consistent arithmetic ops for `Series`, `DataFrame` and `GroupBy` objects
2. May enable deprecation of methods like `mad` (see #20024)
3. Provides easier "demeaning" and "normalization" for grouped data
4. Mirrors xarray implementation which appears well received by user base

**Why may it not be worth doing?**
1. Will add more complexity to a `GroupBy` class that is already in need of refactor
2. TBD

**Consideration Points**
With this proposal, the left operand would always be a `GroupBy` object and the right operand would always be a the result of a function application against that same `GroupBy`. The result of the operation should be a `Series` or `DataFrame` like-indexed to the original object.

That said, the following operations would in theory be identical:

```python
df.groupby('key') - df.groupby('key').mean()
# OR
df.groupby('key') - df.groupby('key').transform('mean')
```

I'm not sure if we care to differentiate between these and force users into choosing one or the other.

**Thoughts?**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Add Support for GroupBy Numeric Operations #20060

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Add Support for GroupBy Numeric Operations #20060

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions