Implement high performance rolling_rank

xref SO issue [here](http://stackoverflow.com/questions/29391787/rolling-argmax-in-pandas/29423104#29423104)

Im looking to set the rolling rank on a dataframe. Having posted, discussed and analysed the code it looks like the suggested way would be to use the pandas Series.rank function as an argument in rolling_apply. However on large datasets the performance is particularly poor. I have tried different implementations and using bottlenecks rank method orders of magnitude faster, but that only offers the average option for ties. It is also still some way off the performance of rolling_mean. I have previously implemented a rolling rank function which monitors changes on a moving window (in a similar way to algos.roll_mean I believe) rather that recalculating the rank from scratch on each window. Below is an example to highlight the performance, it should be possible to implement a rolling rank with comparable performance to rolling_mean. 

python: 2.7.3
pandas: 0.15.2
scipy: 0.10.1
bottleneck: 0.7.0

`````` python
rollWindow = 240
df = pd.DataFrame(np.random.randn(100000,4), columns=list('ABCD'), index=pd.date_range('1/1/2000', periods=100000, freq='1H'))
df.iloc[-3:-1]['A'] = 7.5
df.iloc[-1]['A'] = 5.5

df["SER_RK"] = pd.rolling_apply(df["A"], rollWindow, rollingRankOnSeries)
 #28.9secs (allows competition/min ranking for ties)

df["SCIPY_RK"] = pd.rolling_apply(df["A"], rollWindow, rollingRankSciPy)
 #70.89secs (allows competition/min ranking for ties)

df["BNECK_RK"] = pd.rolling_apply(df["A"], rollWindow, rollingRankBottleneck)
 #3.64secs (only provides average ranking for ties)

df["ASRT_RK"] = pd.rolling_apply(df["A"], rollWindow, rollingRankArgSort)
 #3.56secs (only provides competition/min ranking for ties, not necessarily correct result)

df["MEAN"] = pd.rolling_mean(df['A'], window=rollWindow)
 #0.008secs

def rollingRankOnSeries (array):
    s = pd.Series(array)
    return s.rank(method='min', ascending=False)[len(s)-1]

def rollingRankSciPy (array):
     return array.size + 1 - sc.rankdata(array)[-1]

def rollingRankBottleneck (array):
    return array.size + 1 - bd.rankdata(array)[-1]

def rollingRankArgSort (array):
    return array.size - array.argsort().argsort()[-1]
```python

I think this is likely to be a common request for users looking to use pandas for analysis on large datasets and thought it would be a useful addition to the pandas moving statistics/moments suite?
``````


Values	Rank (Tie Method=Min)	Rank (Tie Method=Max)	Rank (Tie Method=Avg)
3	4	4	4
1	1	1	1
2	2	3	2.5
4	5	5	5
2	2	3	2.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Uh oh!

Implement high performance rolling_rank #9481

28 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Implement high performance rolling_rank #9481

Description

Activity

jreback commented on Feb 13, 2015

PH82 commented on Feb 16, 2015

cing commented on Apr 14, 2015

shoyer commented on Apr 14, 2015

PH82 commented on Jul 21, 2015

cing commented on Jul 21, 2015

PH82 commented on Jul 21, 2015

28 remaining items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions