-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performanceReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode
Milestone
Description
Quick implementation of dataframe shift
def quick_shift(df, N=1):
'''Quick implementation of dataframe shift'''
shift_value = np.roll(df.values,N,axis=0)
shift_value[0:N] = np.NaN
return DataFrame(shift_value, index=df.index, columns=df.columns)
DataFrame.quick_shift = quick_shift
Perf benchmark:
df = DataFrame(np.random.rand(10000,500))
%timeit df1 = df.quick_shift(1)
10 loops, best of 3: 55.3 ms per loop
%timeit df2 = df.shift(1)
1 loops, best of 3: 238 ms per loop
Related to #4095
Metadata
Metadata
Assignees
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performanceReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, Explode
Type
Projects
Relationships
Development
Select code repository
Activity
jreback commentedon Nov 28, 2013
this would need to be done in core/internals.py
needs to handle non trivial shifts (eg something that is not so straightforward like this) and multi dtypes
jreback commentedon Jan 25, 2014
@halleygithub putting on the plate for 0.14. Care to do a PR for this?
halleygithub commentedon Jan 28, 2014
@jreback , as I am not familiar with github, so I am not sure what can I do . But you are free to handle the code as you want. :)
jreback commentedon Jan 28, 2014
it's a nice excuse to get familiar
see here: http://pandas.pydata.org/developers.html
gouthambs commentedon Feb 15, 2014
@jreback I was wondering if I can take a stab at this?
jreback commentedon Feb 15, 2014
of course
lmk if u need help
gouthambs commentedon Feb 15, 2014
@jreback I see shift function in core/internals.py in the Block class and in core/generic.py in the NDFrame class. Can you point me to a place where I can understand some basic code flow? Or can you give me some quick pointers?
jreback commentedon Feb 15, 2014
core/generic /shift sets up what to do eg how many periods to shift and such and translates it to move the data up 3 or whatever; this then calls the internals shift
core/internals/ Block/shift is where to make this change
simply swap out the implantation for the new
make sure tests pass
write a vbench for this (pandas/vb_suite) - find a file with similar methods and stick it their
u can even use the above benchmarks
gouthambs commentedon Feb 15, 2014
@jreback Thanks. I will do that.
jreback commentedon Feb 16, 2014
FYI, I just fixed a bug in shift, see here: #6373
The soln proposed above is still valid in any case....make a change in
core/internals/Block/shift
gouthambs commentedon Feb 17, 2014
Sure. I will take your changes and merge with mine.
gouthambs commentedon Feb 18, 2014
@jreback I swapped the current shift with a roll, in order to see if I get a speed up. I tried swapping in core/internals.py. But this did not get any performance improvements. On the other hand when I use a roll in core/generic.py (just as a test), I see good speedup. The two implementations are shown below:
Do you have any ideas why np.roll speeds up the NDFrame but not the Block class?
The timings for the above mentioned example are as follows:
jreback commentedon Feb 18, 2014
you can't do it in generic.py because that only handles a single dtype. The orientation needs to be changed as the blocks store values 'flipped' from what you think they are. You need to adjust the roll (which IIRC is just an axis swap) to account for this.
In internals you are effectively using axis 1 for the shift (on a row shift), that's what the block_axis is.
transpose the values before roll, then transpose again after the setting
jreback commentedon Feb 18, 2014
[goat-jreback-~/pandas] git diff
jreback commentedon Feb 18, 2014
you can also remove
_shift_indexer
from core/common.py as I think that's the only thing that uses it6 remaining items