RFC: add topk and / or argpartition

numpy provides an indirect way to compute the indices of the smallest (or largest) values of an array using: [numpy.argpartition](https://numpy.org/doc/stable/reference/generated/numpy.argpartition.html).

There is also a proposal to provide a higher level API, namely (arg)topk in numpy:

- https://github.com/numpy/numpy/pull/19117

This PR relies on `numpy.argpartition` internally but it can probably later be optimized to avoid allocating a result array of the size of the input array when `k` is small.

Here is a quick review of some available implementations in related libraries:

- [torch.topk](https://pytorch.org/docs/stable/generated/torch.topk.html) (no such thing as `torch.argpartition`)
   - returns a tuple of values and indices
- [jax.lax.top_k](https://jax.readthedocs.io/en/latest/_autosummary/jax.lax.top_k.html)
   - returns a tuple of values and indices
   - apparently it is quite [slow on GPU](https://github.com/google/jax/issues/9940)
- [dask.array.topk](https://docs.dask.org/en/stable/generated/dask.array.topk.html)
   - returns only the values, I did not find a way to get the indices :(
- [cupy.argpartition](https://docs.cupy.dev/en/stable/reference/generated/cupy.argpartition.html) but internally computes a full `cupy.argsort` which makes it very inefficient  for large arrays and small `k`: O(nlog(n)) instead of O(n).

Motivation: (arg)topk is needed by popular baseline data-science workloads (e.g. k-nearest neighbors classification in scikit-learn) and is surprisingly non trivial to implement efficiently. For instance on GPUs, the fastest implementations are based on some kind of partial radix sort while CPU implementations would use more traditional partial sorting algorithms (as implemented in `std:partial_sort` or `std::nth_element`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: add topk and / or argpartition #629

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

RFC: add topk and / or argpartition #629

Description

Activity

ogrisel commented on May 17, 2023

rgommers commented on May 17, 2023

ogrisel commented on May 17, 2023

shoyer commented on Jun 1, 2023

ogrisel commented on Jun 15, 2023

kgryte commented on Dec 14, 2023

jakirkham commented on Jan 18, 2025

kgryte commented on Jan 18, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions