Closed
Description
In [19]: df = DataFrame({'a':['A1', 'A1', 'A1'], 'b':['B1','B1','B2'], 'c':1})
In [20]: df.set_index('a').groupby('b').rank(method='first')
Out[20]:
c
a
A1 1
A1 2
A1 1
In [21]: df.set_index('a').groupby('c').rank(method='first')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-6b8d4cae9d91> in <module>()
----> 1 df.set_index('a').groupby('c').rank(method='first')
/home/nicolas/Git/pandas/pandas/core/groupby.pyc in rank(self, axis, numeric_only, method, na_option, ascending, pct)
/home/nicolas/Git/pandas/pandas/core/groupby.pyc in wrapper(*args, **kwargs)
618 # mark this column as an error
619 try:
--> 620 return self._aggregate_item_by_item(name, *args, **kwargs)
621 except (AttributeError):
622 raise ValueError
/home/nicolas/Git/pandas/pandas/core/groupby.pyc in _aggregate_item_by_item(self, func, *args, **kwargs)
3076 # GH6337
3077 if not len(result_columns) and errors is not None:
-> 3078 raise errors
3079
3080 return DataFrame(result, columns=result_columns)
TypeError: rank() got an unexpected keyword argument 'numeric_only'
I'm trying to obtain what I would get with a row_number()
in SQL...
Notice that if I replace the value in the 'c'
column with the string '1'
, then even df.set_index('a').groupby('b').rank(method='first')
fails.
Am I doing something wrong?
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
jreback commentedon Dec 4, 2015
you are trying to rank on a string column, which is not supported.
But should give a better message I would think.
nbonnotte commentedon Dec 27, 2015
That's weird, because
.rank()
work withmethod='average'
(the default value) but not withmethod='first'
.I'm looking into it.
nbonnotte commentedon Dec 28, 2015
I think I understand what is going on.
DataFrameGroupBy.rank
is created as part of a whitelist of operators, and its signature is taken fromDataFrame.rank
, which uses a wrapper obtained withDataFrame._make_wrapper
. There, different things are tried to produce the result.With
method='average'
, the firsttry
succeeds.With
method='first'
, the first twotry
s raise an exception with the message "first not supported for non-numeric data", which is good, but then at the lasttry
the methodNDFrame._aggregate_item_by_item
is called. Things go wrong here, as it usesSeriesGroupBy.rank
, the signature of which is taken fromSeries.rank
. And the parameternumeric_only
does not exist there, hence the error.There is a design flaw here:
DataFrame
andSeries
(andPanel
, I guess) versions ofrank
(and the like) should always have the same signatureDataFrameGroupBy.rank
should not use `SeriesGroupBy.rankI'll think about a solution that is as minimalist as possible, solves the initial issue, and if possible addresses this flaw. If I can't, I'll just add a hack somewhere to solve the initial issue.
jreback commentedon Dec 28, 2015
the right way to fix this is to move
Series.rank
andDataFrame.rank
intogeneric.py
and make the signature uniform.You then accept
numeric_only=None
in theSeries.rank
(and raiseNotImplementedError
if its notNone
).Further need to add
axis
as a parameter (the_get_axis_name
handles the case where the axis is > than the ndim FYI).you can raise if
ndim>2
as wellnbonnotte commentedon Dec 28, 2015
So now,
SeriesGroupBy.rank
has the right signature, andSeries._make_wrapper
is used, so again there is a call to._aggregate_item_by_item()
... except that this method comes fromNDFrameGroupBy
, andSeriesGroupBy
does not inherit fromNDFrameGroupBy
, so now anAttributeError
is raised. This is caught and transformed into a simpleValueError
, with the following comment:Indeed, the first call to
_aggregate_item_by_item
(the one that calledSeriesGroupBy.rank
... still following?) uses thisValueError
to simply discard the column, and we end up with an empty dataframe with the example I gave in the beginning.I'm going to prevent the call to
SeriesGroupBy._aggregate_item_by_item
(instead of asking for forgiveness), so that the exceptions can be sorted and a meaningful error message can be given to the user.kuanche commentedon Dec 30, 2015
Hi guys!
Dealing with the exact same issue- any tips on what to try instead?
nbonnotte commentedon Dec 30, 2015
What are you trying to do, exactly?
16 remaining items