DOC: Series.nlargest sorts index with duplicates #24268

Closed

Labels

Docsgood first issue

opened

on Dec 13, 2018

I think the description of this method is incorrect. The effect of parameter "keep" says, if we pass "first" to it, means to take the first duplicate value, while we pass "last" to it, means to take the last duplicate value.
However, its real effect is to sort the duplicate values by their indexes.
Hope we put it right soon.

Contributor

I don't think the docstring is wrong, but could be updated to note that the resulting index is sorted

s = pd.Series([3, 3, 3, 2, 1], index=['a', 'b', 'c', 'd', 'e'])

s.nlargest(2, keep='first')
Out[27]: 
a    3
b    3
dtype: int64

s.nlargest(2, keep='last')
Out[28]: 
c    3
b    3
dtype: int64


# unsorted
s = pd.Series([3, 3, 3, 2, 1], index=['e', 'd', 'c', 'b', 'a'])
s.nlargest(2, keep='last')
Out[30]: 
c    3
d    3

changed the title ~~[-]The mistake of Series.nlargest[/-]~~ DOC: Series.nlargest sorts index with duplicates

on Dec 13, 2018

added

good first issue

on Dec 13, 2018

added this to the Contributions Welcome milestone

on Dec 13, 2018

can I try to fix it?

can I try to fix it?

did you take the issue?

Contributor

@chris-b1 I think the description for keep could be more clear. Currently it said:

            - ``first`` : take the first occurrences based on the index order
            - ``last`` : take the last occurrences based on the index order

The index order might not be the same as the order of the index of the Series if the Series is not ordered.
For instance,

# unsorted
s = pd.Series([3, 3, 3, 2, 1], index=['e', 'd', 'c', 'b', 'a'])
s.nlargest(2, keep='last')
Out[30]: 
c    3
d    3

I would naturally assume the index order for keep='last' would be 'd' and 'e' instead of 'c' and 'd' by thinking of ordering index.

@thoo I think the description is correct insofar as it is order of the items in the index array. It's possible that it could be reworded to be a bit more clear like so:

            - ``first`` : return the first n occurrences in given index order
            - ``last`` : return the last n occurrences in reverse index order

@chris-b1 One thing that I noticed while looking into this is that when you call s.nlargest(2,keep="last") it doesn't return the items sorted by index, but in reverse order by index. For example:

>>>s = pd.Series([3, 1, 3, 2, 3,], index=['e', 'a', 'c', 'b', 'd'])
>>>s.nlargest(3, keep='last')
d    3
c    3
e    3

I feel as if this is a bug in it's own way and that it should instead be returning in the original index order i.e.

//should return
e    3
c    3
d    3

mentioned this

on Feb 18, 2019

DOC: Fix #24268 by updating description for keep in Series.nlargest #25358

closed this as completedin #25358

added a commit that references this issue

DOC: Fix #24268 by updating description for keep in Series.nlargest (#…

added a commit that references this issue

upstream sync (#1)

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

Labels

Docsgood first issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

DOC: Fix #24268 by updating description for keep in Series.nlargestpandas-dev/pandas

Participants