Closed
Description
Each of these asserts should fail:
# on pandas 0.18.1
import pandas as pd
df = pd.DataFrame({'x': range(3)})
assert df.iloc[:] is df
assert df.loc[:] is df
series = df.x
assert series.iloc[:] is series
assert series.loc[:] is series
Instead, for the sake of predictability, these should always return a new DataFrame/Series, even if they reuse the same data. See here for an example of how this resulted in a user facing bug within pandas.
Thanks @mborysow for pointing this out.
Activity
mborysow commentedon Aug 1, 2016
Hurrah. My first remotely important contribution to an open source project. ;)
jreback commentedon Aug 1, 2016
@mborysow even better is a PR!
contributing guide is here
note that you have selected a pretty non-trivial area :)
mborysow commentedon Aug 1, 2016
@jreback I'll take a stab at it tonight. =)
mborysow commentedon Aug 1, 2016
@jreback, @shoyer
Do I care about all these cases?
All except the last one evaluates True.
shoyer commentedon Aug 1, 2016
@mborysow Indexing should always return a new object. So all of those should be False.
mborysow commentedon Aug 1, 2016
@shoyer
Barring pep8 and other such issues, is this a terrible way to approach the problem? First if and elif statements in getitem are mine. I can refactor if necessary. Not sure about overhead yet, they are all pretty cheap comparisons. Will try to actually put this in a pull request and do all the other stuff requested in the contrib guidelines. Just wanted to check if what I was doing was hideous. ;)
It gives the correct behavior for the assertion mentioned earlier as well as fixing my side effect issue. It does cause one existing unit test to fail, but it's possible it's a bad test so I'll take a look (pasted output below).
Had to explicitly test the slice start, stop, and step since a unittest failed trying to compare Timestamp to NoneType when doing key is slice(None, None, None). This covers the case of a single slice(None, None, None) as well as a tuple of any length of them.
shoyer commentedon Aug 1, 2016
@mborysow I think this is closer to the source of the problem. Take note of the two other locations in that file calling
need_slice
as well.The right solution here would probably be either removing that special case or replacing the return value with a shallow copy instead.
margaret commentedon May 22, 2017
Going to take a stab at this (PyCon sprints)
BUG: .iloc[:] and .loc[:] return copy of original object (pandas-dev#…
BUG: .iloc[:] and .loc[:] return copy of original object (pandas-dev#…
BUG: .iloc[:] and .loc[:] return a copy of the original object #13873 (…
Rebase from master (#1)