Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
@@ -434,9 +434,8 @@ Reshaping, sorting
:toctree: generated/

Series.argsort
Series.order
Series.reorder_levels
Series.sort
Series.sort_values
Series.sort_index
Series.sortlevel
Series.swaplevel
@@ -908,7 +907,7 @@ Reshaping, sorting, transposing

DataFrame.pivot
DataFrame.reorder_levels
DataFrame.sort
DataFrame.sort_values
DataFrame.sort_index
DataFrame.sortlevel
DataFrame.nlargest
@@ -1293,7 +1292,6 @@ Modifying and Computations
Index.insert
Index.min
Index.max
Index.order
Index.reindex
Index.repeat
Index.take
@@ -1319,8 +1317,7 @@ Sorting
:toctree: generated/

Index.argsort
Index.order
Index.sort
Index.sort_values

Time-specific operations
~~~~~~~~~~~~~~~~~~~~~~~~
45 changes: 31 additions & 14 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
@@ -1418,39 +1418,56 @@ description.

.. _basics.sorting:

Sorting by index and value
--------------------------
Sorting
-------

.. warning::

The sorting API is substantially changed in 0.17.0, see :ref:`here <whatsnew_0170.api_breaking.sorting>` for these changes.
In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing ``inplace=True``).

There are two obvious kinds of sorting that you may be interested in: sorting
by label and sorting by actual values. The primary method for sorting axis
labels (indexes) across data structures is the :meth:`~DataFrame.sort_index` method.
by label and sorting by actual values.

By Index
~~~~~~~~

The primary method for sorting axis
labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index()`` methods.

.. ipython:: python

unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'],
columns=['three', 'two', 'one'])

# DataFrame
unsorted_df.sort_index()
unsorted_df.sort_index(ascending=False)
unsorted_df.sort_index(axis=1)

:meth:`DataFrame.sort_index` can accept an optional ``by`` argument for ``axis=0``
# Series
unsorted_df['three'].sort_index()

By Values
~~~~~~~~~

The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (that is the values in a column or row).
:meth:`DataFrame.sort_values` can accept an optional ``by`` argument for ``axis=0``
which will use an arbitrary vector or a column name of the DataFrame to
determine the sort order:

.. ipython:: python

df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]})
df1.sort_index(by='two')
df1.sort_values(by='two')

The ``by`` argument can take a list of column names, e.g.:

.. ipython:: python

df1[['one', 'two', 'three']].sort_index(by=['one','two'])

Series has the method :meth:`~Series.order` (analogous to `R's order function
<http://stat.ethz.ch/R-manual/R-patched/library/base/html/order.html>`__) which
sorts by value, with special treatment of NA values via the ``na_position``
These methods have special treatment of NA values via the ``na_position``
argument:

.. ipython:: python
@@ -1459,11 +1476,11 @@ argument:
s.order()
s.order(na_position='first')

.. note::

:meth:`Series.sort` sorts a Series by value in-place. This is to provide
compatibility with NumPy methods which expect the ``ndarray.sort``
behavior. :meth:`Series.order` returns a copy of the sorted data.
.. _basics.searchsorted:

searchsorted
~~~~~~~~~~~~

Series has the :meth:`~Series.searchsorted` method, which works similar to
:meth:`numpy.ndarray.searchsorted`.
@@ -1493,7 +1510,7 @@ faster than sorting the entire Series and calling ``head(n)`` on the result.

s = pd.Series(np.random.permutation(10))
s
s.order()
s.sort_values()
s.nsmallest(3)
s.nlargest(3)

62 changes: 61 additions & 1 deletion doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
@@ -14,6 +14,7 @@ users upgrade to this version.
Highlights include:

- Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`
- The sorting API has been revamped to remove some long-time inconsistencies, see :ref:`here <whatsnew_0170.api_breaking.sorting>`
- The default for ``to_datetime`` will now be to ``raise`` when presented with unparseable formats,
previously this would return the original input, see :ref:`here <whatsnew_0170.api_breaking.to_datetime>`
- The default for ``dropna`` in ``HDFStore`` has changed to ``False``, to store by default all rows even
@@ -187,6 +188,65 @@ Other enhancements
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0170.api_breaking.sorting:

Changes to sorting API
^^^^^^^^^^^^^^^^^^^^^^

The sorting API has had some longtime inconsistencies. (:issue:`9816`,:issue:`8239`).

Here is a summary of the **prior** to 0.17.0 API

- ``Series.sort`` is **INPLACE** while ``DataFrame.sort`` returns a new object.
- ``Series.order`` returned a new object
- It was possible to use ``Series/DataFrame.sort_index`` to sort by **values** by passing the ``by`` keyword.
- ``Series/DataFrame.sortlevel`` worked only on a ``MultiIndex`` for sorting by index.

To address these issues, we have revamped the API:

- We have introduced a new method, :meth:`DataFrame.sort_values`, which is the merger of ``DataFrame.sort()``, ``Series.sort()``,
and ``Series.order``, to handle sorting of **values**.
- The existing method ``Series.sort()`` has been deprecated and will be removed in a
future version of pandas.
- The ``by`` argument of ``DataFrame.sort_index()`` has been deprecated and will be removed in a future version of pandas.
- The methods ``DataFrame.sort()``, ``Series.order()``, will not be recommended to use and will carry a deprecation warning
in the doc-string.
- The existing method ``.sort_index()`` will gain the ``level`` keyword to enable level sorting.

We now have two distinct and non-overlapping methods of sorting. A ``*`` marks items that
will show a ``FutureWarning``.

To sort by the **values**:

================================= ====================================
Previous Replacement
================================= ====================================
\*``Series.order()`` ``Series.sort_values()``
\*``Series.sort()`` ``Series.sort_values(inplace=True)``
\*``DataFrame.sort(columns=...)`` ``DataFrame.sort_values(by=...)``
================================= ====================================

To sort by the **index**:

================================= ====================================
Previous Equivalent
================================= ====================================
``Series.sort_index()`` ``Series.sort_index()``
``Series.sortlevel(level=...)`` ``Series.sort_index(level=...``)
``DataFrame.sort_index()`` ``DataFrame.sort_index()``
``DataFrame.sortlevel(level=...)`` ``DataFrame.sort_index(level=...)``
\*``DataFrame.sort()`` ``DataFrame.sort_index()``
================================== ====================================

We have also deprecated and changed similar methods in two Series-like classes, ``Index`` and ``Categorical``.

================================== ====================================
Previous Replacement
================================== ====================================
\*``Index.order()`` ``Index.sort_values()``
\*``Categorical.order()`` ``Categorical.sort_values``
================================== ====================================

.. _whatsnew_0170.api_breaking.to_datetime:

Changes to to_datetime and to_timedelta
@@ -570,7 +630,7 @@ Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)

- Removal of ``na_last`` parameters from ``Series.order()`` and ``Series.sort()``, in favor of ``na_position``, xref (:issue:`5231`)

.. _whatsnew_0170.performance:

6 changes: 2 additions & 4 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
@@ -262,9 +262,7 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
result.index = bins[:-1]

if sort:
result.sort()
if not ascending:
result = result[::-1]
result = result.sort_values(ascending=ascending)

if normalize:
result = result / float(values.size)
@@ -497,7 +495,7 @@ def select_n_slow(dropped, n, take_last, method):
reverse_it = take_last or method == 'nlargest'
ascending = method == 'nsmallest'
slc = np.s_[::-1] if reverse_it else np.s_[:]
return dropped[slc].order(ascending=ascending).head(n)
return dropped[slc].sort_values(ascending=ascending).head(n)


_select_methods = {'nsmallest': nsmallest, 'nlargest': nlargest}
43 changes: 37 additions & 6 deletions pandas/core/categorical.py
Original file line number Diff line number Diff line change
@@ -1083,7 +1083,7 @@ def argsort(self, ascending=True, **kwargs):
result = result[::-1]
return result

def order(self, inplace=False, ascending=True, na_position='last'):
def sort_values(self, inplace=False, ascending=True, na_position='last'):
""" Sorts the Category by category value returning a new Categorical by default.

Only ordered Categoricals can be sorted!
@@ -1092,10 +1092,10 @@ def order(self, inplace=False, ascending=True, na_position='last'):

Parameters
----------
ascending : boolean, default True
Sort ascending. Passing False sorts descending
inplace : boolean, default False
Do operation in place.
ascending : boolean, default True
Sort ascending. Passing False sorts descending
na_position : {'first', 'last'} (optional, default='last')
'first' puts NaNs at the beginning
'last' puts NaNs at the end
@@ -1139,6 +1139,37 @@ def order(self, inplace=False, ascending=True, na_position='last'):
return Categorical(values=codes,categories=self.categories, ordered=self.ordered,
fastpath=True)

def order(self, inplace=False, ascending=True, na_position='last'):
"""
DEPRECATED: use :meth:`Categorical.sort_values`

Sorts the Category by category value returning a new Categorical by default.

Only ordered Categoricals can be sorted!

Categorical.sort is the equivalent but sorts the Categorical inplace.

Parameters
----------
inplace : boolean, default False
Do operation in place.
ascending : boolean, default True
Sort ascending. Passing False sorts descending
na_position : {'first', 'last'} (optional, default='last')
'first' puts NaNs at the beginning
'last' puts NaNs at the end

Returns
-------
y : Category or None

See Also
--------
Category.sort
"""
warn("order is deprecated, use sort_values(...)",
FutureWarning, stacklevel=2)
return self.sort_values(inplace=inplace, ascending=ascending, na_position=na_position)

def sort(self, inplace=True, ascending=True, na_position='last'):
""" Sorts the Category inplace by category value.
@@ -1163,10 +1194,10 @@ def sort(self, inplace=True, ascending=True, na_position='last'):

See Also
--------
Category.order
Category.sort_values
"""
return self.order(inplace=inplace, ascending=ascending,
na_position=na_position)
return self.sort_values(inplace=inplace, ascending=ascending,
na_position=na_position)

def ravel(self, order='C'):
""" Return a flattened (numpy) array.
3 changes: 3 additions & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
@@ -2155,6 +2155,9 @@ def _mut_exclusive(**kwargs):
return val2


def _not_none(*args):
return (arg for arg in args if arg is not None)

def _any_none(*args):
for arg in args:
if arg is None:
Loading