-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performance
Milestone
Description
These are fairly reproducible on my machine:
before after ratio
[eac4d3f7] [92db5c9c]
+ 396μs 746μs 1.88 inference.DtypeInfer.time_uint32
+ 2.63±0.02ms 3.93±0.1ms 1.49 indexing.IndexingMethods.time_take_dtindex
+ 2.64±0.06ms 3.83±0.06ms 1.45 indexing.IndexingMethods.time_take_intindex
before after ratio
[eac4d3f7] [92db5c9c]
+ 1.43±0.01ms 1.74±0.07ms 1.22 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('BDay', 1)
+ 1.56±0.02ms 1.74±0.07ms 1.12 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('MonthBegin', 2)
+ 56.0ms 61.8ms 1.10 frame_ctor.FrameConstructorDTIndexFromOffsets.time_frame_ctor('FY5253Quarter_2', 2)
Metadata
Metadata
Assignees
Labels
PerformanceMemory or execution speed performanceMemory or execution speed performance
Type
Projects
Relationships
Development
Select code repository
Activity
jreback commentedon Oct 14, 2017
cc @jbrockmendel if u have any ideas here
jbrockmendel commentedon Oct 15, 2017
I'll take a look at this.
jbrockmendel commentedon Oct 15, 2017
Best guess is the change in
_to_i8
(previously in_libs.index
, now intslib
). Old:New:
jbrockmendel commentedon Oct 15, 2017
Hmm in %timeit reverting this makes a pretty big difference for the
ind.get_loc(0)
bench, but pretty small (though at least in the right direction) differences for time_take_dtindex and time_take_intindexjbrockmendel commentedon Oct 16, 2017
Does asv have something analogous to
git bisect
?jorisvandenbossche commentedon Oct 16, 2017
Maybe https://asv.readthedocs.io/en/latest/commands.html#asv-find ?
TomAugspurger commentedon Oct 25, 2017
Sorry, those commit hashes in the original post look entirely incorrect... Let me rerun them.
TomAugspurger commentedon Oct 25, 2017
Updated:
verifying for correctness and then I'll start digging into them.
TomAugspurger commentedon Oct 25, 2017
are real.
_maybe_box_datetimelike
looks suspicious (this is the RC):http://pandas.pydata.org/speed/pandas/#inference.DtypeInfer.time_datetime64 points to ~ eef810ef (cc @jreback)
TomAugspurger commentedon Oct 25, 2017
DtypeInfer:
SeriesArthmetic.time_add_offset*
These are handled by #17980
TomAugspurger commentedon Oct 25, 2017
categoricals.Categoricals3.time_rank_string_cat_ordered
Seems like this is ranking the objects, instead of codes.
This is handled by #17982
TomAugspurger commentedon Oct 25, 2017
packers.*stata.*
Fixed by #17980
8 remaining items
jbrockmendel commentedon Oct 25, 2017
For morale purposes: is there any good news?
TomAugspurger commentedon Oct 25, 2017
:) Yeah there were quite a few:
jbrockmendel commentedon Oct 25, 2017
These may be improved by #17830 and/or #17746.
TomAugspurger commentedon Oct 25, 2017
For
period.Algorithms.time_drop_duplicates_pseries
, it seems likePeriod.freqstr
is being called somewhere inside_ensure_data
/ensure_object
. Could that be avoided?jbrockmendel commentedon Oct 25, 2017
Are you sure its
freqstr
in particular? It seems likedrop_duplicates
would make a bunch of calls toPeriod.__eq__
, which then callsself.freq.__eq__(other.freq)
which is very slow.TomAugspurger commentedon Oct 25, 2017
I think it's in
Period.__hash__
, but confirming now.TomAugspurger commentedon Oct 25, 2017
Hmm that may be it:
master:
with that diff:
Let's see if that breaks anything.
jbrockmendel commentedon Oct 25, 2017
Can you try that with a handful of different frequencies? I'd be shocked if
hash(freq)
is faster thanhash(freq.freqstr)
TomAugspurger commentedon Oct 25, 2017
FWIW, it doesn't seem promising w.r.t. the benchmark (see the 2nd and 4th) lines. I rebuilt the c extensions between these two. My branch with the new hash is on top, master is on bottom.
TomAugspurger commentedon Oct 25, 2017
@jreback if you have time to look at the indexing ones, I'd appreciate it: #17861 (comment) and #17861 (comment)
I'm not sure there's a whole lot to be done though. The first is because we have to check for missing values with list-like indexers now, to raise the warning.
TomAugspurger commentedon Oct 25, 2017
Oh, I'm dumb. Mixed up nanoseconds and microseconds in #17861 (comment)
TomAugspurger commentedon Oct 25, 2017
I think we decided in #17574 that the sparse changes were OK?
jorisvandenbossche commentedon Oct 25, 2017
The series getitem ones (
s[np.arange(10000)]
) are due to changingreindex
toloc
, and apparentlyloc
is slower.TomAugspurger commentedon Oct 27, 2017
I think everything has been addressed.