You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think this happened between nightly-2016-07-08 and nightly-2016-07-09:
~/d/pest master> rustup run nightly-2016-07-08 cargo bench
Running target/release/json-bfd0131a8850b71b
running 1 test
test data ... bench: 10,872 ns/iter (+/- 823)
~/d/pest master> rustup run nightly-2016-07-09 cargo bench
Running target/release/json-bfd0131a8850b71b
running 1 test
test data ... bench: 17,845 ns/iter (+/- 1,472)
So, I'm by no means an llvm expert, but from staring at some ir I believe that one of the differences (maybe even the key difference) is that before the regression the benchmarking closure gets inlined and afterwards it does not.
As far as optimisations go, they usually have some sort of fuel supply in order to not take forever. It is likely this particular inlining/optimisation (lack of which caused the regression) simply didn’t get the consideration.
Either way, I’m not aware of any translator changes that occurred in that date range (OTOH, I haven’t been tracking the PRs during then closely either), nor can I see any LLVM-ups having been done, so I’m very stumped.
I'm having a hard time using git, but #33890 seems to be in nightly-2016-07-09. I originally suspected #34728, but I don't think that landed in time. At least git isn't listing it in my search. 🐳
That’s a viable candidate, yes. It is quite a big change in how we translate our functions.
Is codegen-units ≠ 1 used by anybody here?
cc @michaelwoerister I wonder whether weak_odr linkage could prevent inlining or other cross-function optimisations for some reason (other than the function definition being in another unit, of course).
Yes, it seems like weak_odr prevents inlining opportunities. All of the closures ended up not being inlined due to having weak_odr – changing them to internal linkage promptly inlined all of them.
I haven't had time to look into this yet, but I'll do so shortly. If weak_odr is that bad for inlining, we should avoid it (note that there is #34830, which does so on MingW only, but could easily be adapted to do so on all platforms).
However, we had been using weak_odr for generics for months without anybody noticing that marked a performance drop and there is the internalize_symbols pass that switches symbols to internal linkage where possible (which should be the case for most closures).
I can confirm that current master is still slow compared to the nightly mentioned above (on Linux x86_64). However, comparing the LLVM IR of both versions, it does not seem likely that weak_odr linkage has something to do with the slowdown: The slower version contains only one weak_odr function -- and that isn't called anywhere.
Activity
TimNN commentedon Jul 17, 2016
I can reproduce this (also on os x), just as another data point,
nightly-2016-07-07
is still goodjonas-schievink commentedon Jul 17, 2016
Can confirm on x86_64 Linux,
why the hell did i test 12nightly-2016-07-12
is already badTimNN commentedon Jul 17, 2016
Enabling LTO has no effect on
07-07
but07-10
are now only ~2000ns worse.TimNN commentedon Jul 17, 2016
Also with optimisations completely disabled no performance difference is noticeable.
jonas-schievink commentedon Jul 17, 2016
I think this happened between
nightly-2016-07-08
andnightly-2016-07-09
:TimNN commentedon Jul 17, 2016
So, I'm by no means an llvm expert, but from staring at some ir I believe that one of the differences (maybe even the key difference) is that before the regression the benchmarking closure gets inlined and afterwards it does not.
nagisa commentedon Jul 17, 2016
Similar regression range to #34831.
As far as optimisations go, they usually have some sort of fuel supply in order to not take forever. It is likely this particular inlining/optimisation (lack of which caused the regression) simply didn’t get the consideration.
Either way, I’m not aware of any translator changes that occurred in that date range (OTOH, I haven’t been tracking the PRs during then closely either), nor can I see any LLVM-ups having been done, so I’m very stumped.
jonas-schievink commentedon Jul 17, 2016
I'm having a hard time using git, but #33890 seems to be in
nightly-2016-07-09
. I originally suspected #34728, but I don't think that landed in time. At least git isn't listing it in my search. 🐳nagisa commentedon Jul 17, 2016
That’s a viable candidate, yes. It is quite a big change in how we translate our functions.
Is codegen-units ≠ 1 used by anybody here?
cc @michaelwoerister I wonder whether weak_odr linkage could prevent inlining or other cross-function optimisations for some reason (other than the function definition being in another unit, of course).
TimNN commentedon Jul 17, 2016
@jonas-schievink: I came to the same conclusion (#33890 is the likeliest suspect, #34728 is not included in
2016-07-09
).@nagisa: I'm not setting
codegen-units
explicitly anywhere so I assume the cargo default of 1 is used.nagisa commentedon Jul 18, 2016
Yes, it seems like
weak_odr
prevents inlining opportunities. All of the closures ended up not being inlined due to havingweak_odr
– changing them to internal linkage promptly inlined all of them.michaelwoerister commentedon Jul 18, 2016
I haven't had time to look into this yet, but I'll do so shortly. If
weak_odr
is that bad for inlining, we should avoid it (note that there is #34830, which does so on MingW only, but could easily be adapted to do so on all platforms).However, we had been using
weak_odr
for generics for months without anybody noticing that marked a performance drop and there is theinternalize_symbols
pass that switches symbols to internal linkage where possible (which should be the case for most closures).michaelwoerister commentedon Jul 18, 2016
I can confirm that current master is still slow compared to the nightly mentioned above (on Linux x86_64). However, comparing the LLVM IR of both versions, it does not seem likely that
weak_odr
linkage has something to do with the slowdown: The slower version contains only oneweak_odr
function -- and that isn't called anywhere.michaelwoerister commentedon Jul 18, 2016
I ignore that last comment, I was accidentally looking at the IR of the library, not of the benchmark-executable.
Auto merge of #34899 - michaelwoerister:always_internalize_symbols, r…
pmarcelll commentedon Jul 26, 2016
It seems like the new nightly fixed this.
eddyb commentedon Jul 26, 2016
@pmarcelll What nightly? https://static.rust-lang.org/dist/index.html doesn't list anything newer than 2016-07-21.
hanna-kruppe commentedon Jul 26, 2016
That's strange.
rustup update
gave me a nightly with a commit-date of June 24 (9316ae5) and the dashboard picked up a new nightly as well.eddyb commentedon Jul 26, 2016
@rkruppe Ah, @alexcrichton informed me that it was a manual build and not enough was updated.