Skip to content

PERF: cythonize groupby-rank #15779

Closed
@jreback

Description

@jreback
Contributor

This dispatches to each group individually. Better to have a combined group_rank to do this. It is a bit of code and ideally would share some with the actual rank algos.

In [7]: ngroups = 1000

In [8]: N = 100000

In [9]: np.random.seed(1234)

In [10]: df = DataFrame({'key': np.random.randint(0, ngroups, size=N), 'value': np.arange(N)})

In [11]: %timeit df.groupby('key').rank()
1 loop, best of 3: 392 ms per loop

# comparision with group_shift_indexer, a transforming operator
In [13]: %timeit df.groupby('key').shift()
100 loops, best of 3: 3.15 ms per loop

Activity

modified the milestone: Next Major Release on Mar 22, 2017
modified the milestone: Interesting Issues on Nov 26, 2017
WillAyd

WillAyd commented on Jan 25, 2018

@WillAyd
Member

I can take a look at this. Any tips on what methods to explore? I was thinking of adding a method to the GroupBy class similar to the others for rank and was looking at the rank method in algos.

It wasn't immediately clear to me the best way to knit that all together so figured I'd get your thoughts if you have any

7 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    GroupbyNumeric OperationsArithmetic, Comparison, and Logical operationsPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @WillAyd@jreback

      Issue actions

        PERF: cythonize groupby-rank · Issue #15779 · pandas-dev/pandas