You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there any reason that whenever I'm doing inplace=True I always get self returned? Obviously not a huge deal, but this is kind of wart-y IMO. I wouldn't expect inplace to return anything.
I go back and forth on it. At some point I'd decided it was better to have a consistent API w.r.t. return values (vs. None in the inplace=True case), e.g. whether or not inplace=True, you can always count on getting a reference back to the modified object.
Sure, but this is different than the in-place operations in python and in numpy. When I was first using pandas, I was really thrown by the default, make copies everywhere and return an object, especially when python and numpy lead me to expect in-place operation (e.g., sort, assigment to a view). I was bit by not catching this often (kind of like mpl returns things that you don't really need). Then I discovered the inplace keyword. Great, because I almost always want to do things in-place and avoid all the extra typing of assignment, though I have to use the keyword now everywhere. Just seems unnecessary to return the object since I explicitly asked for inplace and I already have the reference to it. Just noise at the interpreter when working interactively.
Just as a follow-up, when writing notebooks I have to put semi-colons after every line where I do in-place operations so it doesn't barf the returned self.
Wes, your argument confuses me. Do you consider the inplace option a special case or not? Also, I just learned that ipython keeps unassigned memory objects alive for the history (the _ thingie). Is this true for notebooks as well and could this be an argument for really not returning an object when doing things inplace?
Fair enough, but pandas is still an outlier in this respect compared to many of the methods in numpy and python itself. Admittedly, my argument for a (default) inplace not returning self is because I want to save myself typing and improve readability of output at the interpreter for teaching, presenting, or demonstrating. I don't often have serious concerns about memory use and readability of scripts.
What's the alternative? Having options where inplace can be 'true', 'false', or 'return'?
Or you could provide some kind of chainable, inplace interface. I'm thinking a lot lately about building a DSL layer around pandas so you could do things like:
frame do {
.dropna axis=1
ab_diff = a - b
} group by key1 key2 {
max(ab_diff)
std(a)
}
and have that be as fast and memory-efficient as possible. And then you could easily chain "in place operations" and get what you expect
I fail to see the 'orthogonality' (maybe because 'orthogonal' is linguistically overrated, IMHO). Your claim that these design questions would be independent (i.e. 'orthogonal'), supports the use of something like obj.opA().obB(inplace=True).obC(). I even don't want to start thinking about what I just did there and which of all the objects flying around has what content now. The cleanest interface for me is: When I do inplace, it effects my original object, if not, it is safe.
Activity
changhiskhan commentedon Sep 11, 2012
Good idea actually. Useful to make the user more explicit about invoking side-effects.
wesm commentedon Sep 13, 2012
I go back and forth on it. At some point I'd decided it was better to have a consistent API w.r.t. return values (vs.
None
in theinplace=True
case), e.g. whether or notinplace=True
, you can always count on getting a reference back to the modified object.jseabold commentedon Sep 16, 2012
Sure, but this is different than the in-place operations in python and in numpy. When I was first using pandas, I was really thrown by the default, make copies everywhere and return an object, especially when python and numpy lead me to expect in-place operation (e.g., sort, assigment to a view). I was bit by not catching this often (kind of like mpl returns things that you don't really need). Then I discovered the inplace keyword. Great, because I almost always want to do things in-place and avoid all the extra typing of assignment, though I have to use the keyword now everywhere. Just seems unnecessary to return the object since I explicitly asked for inplace and I already have the reference to it. Just noise at the interpreter when working interactively.
jseabold commentedon Oct 18, 2012
Just as a follow-up, when writing notebooks I have to put semi-colons after every line where I do in-place operations so it doesn't barf the returned self.
michaelaye commentedon Dec 3, 2012
Wes, your argument confuses me. Do you consider the
inplace
option a special case or not? Also, I just learned that ipython keeps unassigned memory objects alive for the history (the_
thingie). Is this true for notebooks as well and could this be an argument for really not returning an object when doing thingsinplace
?wesm commentedon Dec 3, 2012
I think the proposal on the table is to always return
None
when usinginplace=True
. Moving this to 0.10ghost commentedon Dec 3, 2012
Note that not returning
self
breaks the fluent interface: a.opA().opB().opC()jseabold commentedon Dec 3, 2012
Well, inplace is optional, so the easy solution is don't use inplace if you want to in turn do an operation on the returned object.
wesm commentedon Dec 3, 2012
Another place where Python's eager evaluation can be a weakness
ghost commentedon Dec 3, 2012
those are two orthogonal considerations. one (arguably) should not force the other.
jseabold commentedon Dec 3, 2012
Fair enough, but pandas is still an outlier in this respect compared to many of the methods in numpy and python itself. Admittedly, my argument for a (default) inplace not returning self is because I want to save myself typing and improve readability of output at the interpreter for teaching, presenting, or demonstrating. I don't often have serious concerns about memory use and readability of scripts.
What's the alternative? Having options where inplace can be 'true', 'false', or 'return'?
wesm commentedon Dec 3, 2012
Or you could provide some kind of chainable, inplace interface. I'm thinking a lot lately about building a DSL layer around pandas so you could do things like:
and have that be as fast and memory-efficient as possible. And then you could easily chain "in place operations" and get what you expect
michaelaye commentedon Dec 3, 2012
I fail to see the 'orthogonality' (maybe because 'orthogonal' is linguistically overrated, IMHO). Your claim that these design questions would be independent (i.e. 'orthogonal'), supports the use of something like
obj.opA().obB(inplace=True).obC()
. I even don't want to start thinking about what I just did there and which of all the objects flying around has what content now. The cleanest interface for me is: When I doinplace
, it effects my original object, if not, it is safe.ghost commentedon Dec 3, 2012
is not that bad IMHO.
because it's clear to you that you might be misusing the API.
52 remaining items