Skip to content

DEPR: deprecate .ix in favor of .loc/.iloc #15113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 6 additions & 17 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
@@ -230,7 +230,7 @@ of tuples:
Advanced indexing with hierarchical index
-----------------------------------------

Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc/.ix`` is a
Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc`` is a
bit challenging, but we've made every effort to do so. for example the
following works as you would expect:

@@ -258,7 +258,7 @@ Passing a list of labels or tuples works similar to reindexing:

.. ipython:: python

df.ix[[('bar', 'two'), ('qux', 'one')]]
df.loc[[('bar', 'two'), ('qux', 'one')]]

.. _advanced.mi_slicers:

@@ -604,7 +604,7 @@ intended to work on boolean indices and may return unexpected results.

ser = pd.Series(np.random.randn(10))
ser.take([False, False, True, True])
ser.ix[[0, 1]]
ser.iloc[[0, 1]]

Finally, as a small note on performance, because the ``take`` method handles
a narrower range of inputs, it can offer performance that is a good deal
@@ -620,7 +620,7 @@ faster than fancy indexing.
timeit arr.take(indexer, axis=0)

ser = pd.Series(arr[:, 0])
timeit ser.ix[indexer]
timeit ser.iloc[indexer]
timeit ser.take(indexer)

.. _indexing.index_types:
@@ -661,7 +661,7 @@ Setting the index, will create create a ``CategoricalIndex``
df2 = df.set_index('B')
df2.index

Indexing with ``__getitem__/.iloc/.loc/.ix`` works similarly to an ``Index`` with duplicates.
Indexing with ``__getitem__/.iloc/.loc`` works similarly to an ``Index`` with duplicates.
The indexers MUST be in the category or the operation will raise.

.. ipython:: python
@@ -759,14 +759,12 @@ same.
sf = pd.Series(range(5), index=indexf)
sf

Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)

.. ipython:: python

sf[3]
sf[3.0]
sf.ix[3]
sf.ix[3.0]
sf.loc[3]
sf.loc[3.0]

@@ -783,7 +781,6 @@ Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS posit
.. ipython:: python

sf[2:4]
sf.ix[2:4]
sf.loc[2:4]
sf.iloc[2:4]

@@ -813,14 +810,6 @@ In non-float indexes, slicing using floats will raise a ``TypeError``
In [3]: pd.Series(range(5)).iloc[3.0]
TypeError: cannot do positional indexing on <class 'pandas.indexes.range.RangeIndex'> with these indexers [3.0] of <type 'float'>

Further the treatment of ``.ix`` with a float indexer on a non-float index, will be label based, and thus coerce the index.

.. ipython:: python

s2 = pd.Series([1, 2, 3], index=list('abc'))
s2
s2.ix[1.0] = 10
s2

Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
irregular timedelta-like indexing scheme, but the data is recorded as floats. This could for
9 changes: 3 additions & 6 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
@@ -268,13 +268,12 @@ Indexing, iteration
Series.get
Series.at
Series.iat
Series.ix
Series.loc
Series.iloc
Series.__iter__
Series.iteritems

For more information on ``.at``, ``.iat``, ``.ix``, ``.loc``, and
For more information on ``.at``, ``.iat``, ``.loc``, and
``.iloc``, see the :ref:`indexing documentation <indexing>`.

Binary operator functions
@@ -774,7 +773,6 @@ Indexing, iteration
DataFrame.head
DataFrame.at
DataFrame.iat
DataFrame.ix
DataFrame.loc
DataFrame.iloc
DataFrame.insert
@@ -791,7 +789,7 @@ Indexing, iteration
DataFrame.mask
DataFrame.query

For more information on ``.at``, ``.iat``, ``.ix``, ``.loc``, and
For more information on ``.at``, ``.iat``, ``.loc``, and
``.iloc``, see the :ref:`indexing documentation <indexing>`.


@@ -1090,7 +1088,6 @@ Indexing, iteration, slicing

Panel.at
Panel.iat
Panel.ix
Panel.loc
Panel.iloc
Panel.__iter__
@@ -1100,7 +1097,7 @@ Indexing, iteration, slicing
Panel.major_xs
Panel.minor_xs

For more information on ``.at``, ``.iat``, ``.ix``, ``.loc``, and
For more information on ``.at``, ``.iat``, ``.loc``, and
``.iloc``, see the :ref:`indexing documentation <indexing>`.

Binary operator functions
6 changes: 3 additions & 3 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
@@ -145,7 +145,7 @@ either match on the *index* or *columns* via the **axis** keyword:
'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})
df
row = df.ix[1]
row = df.iloc[1]
column = df['two']

df.sub(row, axis='columns')
@@ -556,7 +556,7 @@ course):
series[::2] = np.nan
series.describe()
frame = pd.DataFrame(np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
frame.ix[::2] = np.nan
frame.iloc[::2] = np.nan
frame.describe()

You can select specific percentiles to include in the output:
@@ -1081,7 +1081,7 @@ objects either on the DataFrame's index or columns using the ``axis`` argument:

.. ipython:: python

df.align(df2.ix[0], axis=1)
df.align(df2.iloc[0], axis=1)

.. _basics.reindex_fill:

3 changes: 1 addition & 2 deletions doc/source/categorical.rst
Original file line number Diff line number Diff line change
@@ -482,7 +482,7 @@ Pivot tables:
Data munging
------------

The optimized pandas data access methods ``.loc``, ``.iloc``, ``.ix`` ``.at``, and ``.iat``,
The optimized pandas data access methods ``.loc``, ``.iloc``, ``.at``, and ``.iat``,
work as normal. The only difference is the return type (for getting) and
that only values already in `categories` can be assigned.

@@ -501,7 +501,6 @@ the ``category`` dtype is preserved.
df.iloc[2:4,:]
df.iloc[2:4,:].dtypes
df.loc["h":"j","cats"]
df.ix["h":"j",0:1]
df[df["cats"] == "b"]

An example where the category type is not preserved is if you take one single row: the
10 changes: 5 additions & 5 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
@@ -84,8 +84,8 @@ in order to have a valid result.
.. ipython:: python

frame = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c'])
frame.ix[:5, 'a'] = np.nan
frame.ix[5:10, 'b'] = np.nan
frame.loc[frame.index[:5], 'a'] = np.nan
frame.loc[frame.index[5:10], 'b'] = np.nan

frame.cov()

@@ -120,7 +120,7 @@ All of these are currently computed using pairwise complete observations.
.. ipython:: python

frame = pd.DataFrame(np.random.randn(1000, 5), columns=['a', 'b', 'c', 'd', 'e'])
frame.ix[::2] = np.nan
frame.iloc[::2] = np.nan

# Series with Series
frame['a'].corr(frame['b'])
@@ -137,8 +137,8 @@ Like ``cov``, ``corr`` also supports the optional ``min_periods`` keyword:
.. ipython:: python

frame = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c'])
frame.ix[:5, 'a'] = np.nan
frame.ix[5:10, 'b'] = np.nan
frame.loc[frame.index[:5], 'a'] = np.nan
frame.loc[frame.index[5:10], 'b'] = np.nan

frame.corr()

21 changes: 9 additions & 12 deletions doc/source/cookbook.rst
Original file line number Diff line number Diff line change
@@ -66,19 +66,19 @@ An if-then on one column

.. ipython:: python

df.ix[df.AAA >= 5,'BBB'] = -1; df
df.loc[df.AAA >= 5,'BBB'] = -1; df

An if-then with assignment to 2 columns:

.. ipython:: python

df.ix[df.AAA >= 5,['BBB','CCC']] = 555; df
df.loc[df.AAA >= 5,['BBB','CCC']] = 555; df

Add another line with different logic, to do the -else

.. ipython:: python

df.ix[df.AAA < 5,['BBB','CCC']] = 2000; df
df.loc[df.AAA < 5,['BBB','CCC']] = 2000; df

Or use pandas where after you've set up a mask

@@ -149,7 +149,7 @@ Building Criteria
{'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

aValue = 43.0
df.ix[(df.CCC-aValue).abs().argsort()]
df.loc[(df.CCC-aValue).abs().argsort()]

`Dynamically reduce a list of criteria using a binary operators
<http://stackoverflow.com/questions/21058254/pandas-boolean-operation-in-a-python-list/21058331>`__
@@ -217,9 +217,9 @@ There are 2 explicit slicing methods, with a third general case

df.loc['bar':'kar'] #Label

#Generic
df.ix[0:3] #Same as .iloc[0:3]
df.ix['bar':'kar'] #Same as .loc['bar':'kar']
# Generic
df.iloc[0:3]
df.loc['bar':'kar']

Ambiguity arises when an index consists of integers with a non-zero start or non-unit increment.

@@ -231,9 +231,6 @@ Ambiguity arises when an index consists of integers with a non-zero start or non

df2.loc[1:3] #Label-oriented

df2.ix[1:3] #General, will mimic loc (label-oriented)
df2.ix[0:3] #General, will mimic iloc (position-oriented), as loc[0:3] would raise a KeyError

`Using inverse operator (~) to take the complement of a mask
<http://stackoverflow.com/questions/14986510/picking-out-elements-based-on-complement-of-indices-in-python-pandas>`__

@@ -440,7 +437,7 @@ Fill forward a reversed timeseries
.. ipython:: python

df = pd.DataFrame(np.random.randn(6,1), index=pd.date_range('2013-08-01', periods=6, freq='B'), columns=list('A'))
df.ix[3,'A'] = np.nan
df.loc[df.index[3], 'A'] = np.nan
df
df.reindex(df.index[::-1]).ffill()

@@ -545,7 +542,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to

agg_n_sort_order = code_groups[['data']].transform(sum).sort_values(by='data')

sorted_df = df.ix[agg_n_sort_order.index]
sorted_df = df.loc[agg_n_sort_order.index]

sorted_df

60 changes: 4 additions & 56 deletions doc/source/gotchas.rst
Original file line number Diff line number Diff line change
@@ -221,7 +221,7 @@ Label-based indexing with integer axis labels is a thorny topic. It has been
discussed heavily on mailing lists and among various members of the scientific
Python community. In pandas, our general viewpoint is that labels matter more
than integer locations. Therefore, with an integer axis index *only*
label-based indexing is possible with the standard tools like ``.ix``. The
label-based indexing is possible with the standard tools like ``.loc``. The
following code will generate exceptions:

.. code-block:: python
@@ -230,7 +230,7 @@ following code will generate exceptions:
s[-1]
df = pd.DataFrame(np.random.randn(5, 4))
df
df.ix[-2:]
df.loc[-2:]

This deliberate decision was made to prevent ambiguities and subtle bugs (many
users reported finding bugs when the API change was made to stop "falling back"
@@ -305,15 +305,15 @@ index can be somewhat complicated. For example, the following does not work:

::

s.ix['c':'e'+1]
s.loc['c':'e'+1]

A very common use case is to limit a time series to start and end at two
specific dates. To enable this, we made the design design to make label-based
slicing include both endpoints:

.. ipython:: python

s.ix['c':'e']
s.loc['c':'e']

This is most definitely a "practicality beats purity" sort of thing, but it is
something to watch out for if you expect label-based slicing to behave exactly
@@ -322,58 +322,6 @@ in the way that standard Python integer slicing works.
Miscellaneous indexing gotchas
------------------------------

Reindex versus ix gotchas
~~~~~~~~~~~~~~~~~~~~~~~~~

Many users will find themselves using the ``ix`` indexing capabilities as a
concise means of selecting data from a pandas object:

.. ipython:: python

df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four'],
index=list('abcdef'))
df
df.ix[['b', 'c', 'e']]

This is, of course, completely equivalent *in this case* to using the
``reindex`` method:

.. ipython:: python

df.reindex(['b', 'c', 'e'])

Some might conclude that ``ix`` and ``reindex`` are 100% equivalent based on
this. This is indeed true **except in the case of integer indexing**. For
example, the above operation could alternately have been expressed as:

.. ipython:: python

df.ix[[1, 2, 4]]

If you pass ``[1, 2, 4]`` to ``reindex`` you will get another thing entirely:

.. ipython:: python

df.reindex([1, 2, 4])

So it's important to remember that ``reindex`` is **strict label indexing
only**. This can lead to some potentially surprising results in pathological
cases where an index contains, say, both integers and strings:

.. ipython:: python

s = pd.Series([1, 2, 3], index=['a', 0, 1])
s
s.ix[[0, 1]]
s.reindex([0, 1])

Because the index in this case does not contain solely integers, ``ix`` falls
back on integer indexing. By contrast, ``reindex`` only looks for the values
passed in the index, thus finding the integers ``0`` and ``1``. While it would
be possible to insert some logic to check whether a passed sequence is all
contained in the index, that logic would exact a very high cost in large data
sets.

Reindex potentially changes underlying Series dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Loading