-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
Milestone
Description
I wouldn't be surprised if there is already an issue about this, but couldn't directly find one.
When doing a subselection of columns on a DataFrameGroupBy object, both a plain list (so a tuple within the __getitem__
[] brackets) as the double square brackets (a list inside the __getitem__
[] brackets) seems to work:
In [6]: df = pd.DataFrame(np.random.randint(10, size=(10, 4)), columns=['a', 'b', 'c', 'd'])
In [8]: df.groupby('a').sum()
Out[8]:
b c d
a
0 0 5 7
3 18 6 12
4 16 6 9
6 10 11 11
9 3 3 0
In [9]: df.groupby('a')['b', 'c'].sum()
Out[9]:
b c
a
0 0 5
3 18 6
4 16 6
6 10 11
9 3 3
In [10]: df.groupby('a')[['b', 'c']].sum()
Out[10]:
b c
a
0 0 5
3 18 6
4 16 6
6 10 11
9 3 3
Personally I find this df.groupby('a')['b', 'c'].sum()
a bit strange, and inconsistent with how DataFrame indexing works.
Of course, on a DataFrameGroupBy you don't have the possible confusion with indexing multiple dimensions (rows, columns), but still.
guzmanojero
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
Select code repository
Activity
TomAugspurger commentedon Nov 8, 2018
You do have ambiguity with tuples though (not that anyone should do that)
I think both of those are incorrect. It should rather be
WillAyd commentedon Nov 13, 2018
I don't disagree here. There is a difference when selecting only one column (specifically returning a Series vs a DataFrame) but when selecting multiple columns it would be more consistent if we ALWAYS required double brackets brackets. I assume this would also yield a simpler implementation.
Maybe a conversation piece for 1.0? Would be a breaking change for sure so probably best served in a major release like that
TomAugspurger commentedon Nov 13, 2018
I think the hope is for 1.0 to be backwards compatible with 0.25.x.
Do we have a chance to detect this case and throw a FutureWarning (assuming we want to change)?
jorisvandenbossche commentedon Nov 13, 2018
Yeah, if we want, I would think it should be possible with a deprecation cycle.
jreback commentedon Nov 14, 2018
this i suspect is actually very common in the wild (not using the double brackets)
but i agree we should deprecate as it is inconsistent
yehoshuadimarsky commentedon Dec 25, 2019
Can I take a crack at this or has it already been fixed?
Also, will this be a part of the 1.0 or other milestones?
WillAyd commentedon Dec 25, 2019
yehoshuadimarsky commentedon Dec 25, 2019
Thanks will do.
yehoshuadimarsky commentedon Dec 25, 2019
take
yehoshuadimarsky commentedon Dec 25, 2019
So this is my first time working on pandas code, and I'm a little confused here, so please bear with me. I'm also new to linking to code on GitHub.
As I understand, when an object calls
__getitem__
by using brackets, if you pass in several keys, they are implicitly converted to a tuple of one key. Sodf['a','b']
is reallydf[('a','b')]
under the hood.I'm having trouble in tracing the code path to figure out where exactly the
__getitem__
on theGroupBy
is actually implemented here:DataFrame.groupby
is called on the superclassNDFrame
hereDataFrameGroupBy
object hereGroupBy
_GroupBy
SelectionMixin
, defined here__getitem__
herekey
is a list or tuple, returnsself._gotitem(list(key), ndim=2)
self._gotitem
needs to be implemented by the respective subclasses, which in this case is theDataFrameGroupBy
object, and is implemented hereDataFrameGroupBy
) with thekey
(a list/tuple) passed as a slice to theselection
parameterselection
parameter is implemented in the parent_GroupBy
object, which sets the internalself._selection
attribute to thekey
hereAny help here would be greatly appreciated. Thanks.
yehoshuadimarsky commentedon Dec 29, 2019
@WillAyd @jorisvandenbossche are you able to help point me in the right direction? ☝️
7 remaining items
MNT pandas 1.0.0 deprectation
MNT pandas 1.0.0 deprectation
DOC avoid FutureWarnings for deprecations examples (#17264)
DOC avoid FutureWarnings for deprecations examples (scikit-learn#17264)
DOC avoid FutureWarnings for deprecations examples (#17264)
kusaasira commentedon Jun 24, 2020
@yehoshuadimarsky , this was closed, right?
yehoshuadimarsky commentedon Jun 25, 2020
yes
DOC avoid FutureWarnings for deprecations examples (scikit-learn#17264)
Thuoq commentedon May 8, 2021
Thanks for , I at version Pandas I using group[[colone_name,]] so it is useful and clear code better
test_l3.py : Indexing with multiple keys need to make single [] to do…
lf_rssi_check.py : Indexing with multiple keys need to make single []…
test_l3_longevity.py Indexing with multiple keys need to make single …