Skip to content

.iloc[:] and .loc[:] return the original object #13873

Closed
@shoyer

Description

@shoyer
Member

Each of these asserts should fail:

# on pandas 0.18.1
import pandas as pd

df = pd.DataFrame({'x': range(3)})
assert df.iloc[:] is df
assert df.loc[:] is df

series = df.x
assert series.iloc[:] is series
assert series.loc[:] is series

Instead, for the sake of predictability, these should always return a new DataFrame/Series, even if they reuse the same data. See here for an example of how this resulted in a user facing bug within pandas.

Thanks @mborysow for pointing this out.

Activity

added this to the Next Major Release milestone on Aug 1, 2016
mborysow

mborysow commented on Aug 1, 2016

@mborysow

Hurrah. My first remotely important contribution to an open source project. ;)

jreback

jreback commented on Aug 1, 2016

@jreback
Contributor

@mborysow even better is a PR!

contributing guide is here

note that you have selected a pretty non-trivial area :)

mborysow

mborysow commented on Aug 1, 2016

@mborysow

@jreback I'll take a stab at it tonight. =)

mborysow

mborysow commented on Aug 1, 2016

@mborysow

@jreback, @shoyer
Do I care about all these cases?

All except the last one evaluates True.

df2.loc[:] is df2
df2.loc[:, :] is df2
df2.loc[pd.IndexSlice[:, :]] is df2
df2.loc[pd.IndexSlice[:, :, :], :] is df2
shoyer

shoyer commented on Aug 1, 2016

@shoyer
MemberAuthor

@mborysow Indexing should always return a new object. So all of those should be False.

mborysow

mborysow commented on Aug 1, 2016

@mborysow

@shoyer
Barring pep8 and other such issues, is this a terrible way to approach the problem? First if and elif statements in getitem are mine. I can refactor if necessary. Not sure about overhead yet, they are all pretty cheap comparisons. Will try to actually put this in a pull request and do all the other stuff requested in the contrib guidelines. Just wanted to check if what I was doing was hideous. ;)

class _LocationIndexer(_NDFrameIndexer):
    _exception = Exception
    def __getitem__(self, key):
        # if being called as .loc[:]/.iloc[:] return a shallow copy
        if isinstance(key, slice) and all([key.start is None, key.stop is None, key.step is None]):
            return self.obj.copy(deep=False)
        # if being called with a key that is a tuple containing only any number of slice(None, None, None) objects
        elif (isinstance(key, tuple) and all([isinstance(subkey, slice) for subkey in key]) and
             all([all([subkey.start is None, subkey.stop is None, subkey.step is None]) for subkey in key])):
            return self.obj.copy(deep=False)

        if isinstance(key, tuple):
            key = tuple(com._apply_if_callable(x, self.obj) for x in key)
        else:
            # scalar callable may return tuple
            key = com._apply_if_callable(key, self.obj)

        if type(key) is tuple:
            return self._getitem_tuple(key)
        else:
            return self._getitem_axis(key, axis=0)

It gives the correct behavior for the assertion mentioned earlier as well as fixing my side effect issue. It does cause one existing unit test to fail, but it's possible it's a bad test so I'll take a look (pasted output below).

Had to explicitly test the slice start, stop, and step since a unittest failed trying to compare Timestamp to NoneType when doing key is slice(None, None, None). This covers the case of a single slice(None, None, None) as well as a tuple of any length of them.

Single failing test:
FAIL: test_sort_values (pandas.tests.series.test_sorting.TestSeriesSorting)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/net/brainy/gobs/repositories/pandas/pandas/tests/series/test_sorting.py", line 82, in test_sort_values
    self.assertRaises(ValueError, f)
  File "/net/brainy/gobs/repositories/pandas/pandas/util/testing.py", line 2326, in assertRaises
    _callable(*args, **kwargs)
  File "/net/brainy/gobs/repositories/pandas/pandas/util/testing.py", line 2396, in __exit__
    raise AssertionError("{0} not raised.".format(name))
AssertionError: ValueError not raised.
shoyer

shoyer commented on Aug 1, 2016

@shoyer
MemberAuthor

@mborysow I think this is closer to the source of the problem. Take note of the two other locations in that file calling need_slice as well.

The right solution here would probably be either removing that special case or replacing the return value with a shallow copy instead.

margaret

margaret commented on May 22, 2017

@margaret
Contributor

Going to take a stab at this (PyCon sprints)

modified the milestones: 0.21.0, Next Major Release on Jun 1, 2017
added a commit that references this issue on Aug 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@shoyer@margaret@mborysow

      Issue actions

        .iloc[:] and .loc[:] return the original object · Issue #13873 · pandas-dev/pandas