Skip to content

Performance regression in nightly #34891

Closed
@dragostis

Description

@dragostis

Running cargo bench in pest on nightly-2016-07-02-x86_64-apple-darwin:

test data ... bench:      15,338 ns/iter (+/- 5,197)

On current nightly or nightly-2016-07-10-x86_64-apple-darwin:

test data ... bench:      28,605 ns/iter (+/- 7,901)

Activity

TimNN

TimNN commented on Jul 17, 2016

@TimNN
Contributor

I can reproduce this (also on os x), just as another data point,nightly-2016-07-07 is still good

jonas-schievink

jonas-schievink commented on Jul 17, 2016

@jonas-schievink
Contributor

Can confirm on x86_64 Linux, nightly-2016-07-12 is already bad why the hell did i test 12

TimNN

TimNN commented on Jul 17, 2016

@TimNN
Contributor

Enabling LTO has no effect on 07-07 but 07-10 are now only ~2000ns worse.

TimNN

TimNN commented on Jul 17, 2016

@TimNN
Contributor

Also with optimisations completely disabled no performance difference is noticeable.

jonas-schievink

jonas-schievink commented on Jul 17, 2016

@jonas-schievink
Contributor

I think this happened between nightly-2016-07-08 and nightly-2016-07-09:

~/d/pest master> rustup run nightly-2016-07-08 cargo bench
     Running target/release/json-bfd0131a8850b71b

running 1 test
test data ... bench:      10,872 ns/iter (+/- 823)

~/d/pest master> rustup run nightly-2016-07-09 cargo bench
     Running target/release/json-bfd0131a8850b71b

running 1 test
test data ... bench:      17,845 ns/iter (+/- 1,472)
TimNN

TimNN commented on Jul 17, 2016

@TimNN
Contributor

So, I'm by no means an llvm expert, but from staring at some ir I believe that one of the differences (maybe even the key difference) is that before the regression the benchmarking closure gets inlined and afterwards it does not.

nagisa

nagisa commented on Jul 17, 2016

@nagisa
Member

Similar regression range to #34831.

As far as optimisations go, they usually have some sort of fuel supply in order to not take forever. It is likely this particular inlining/optimisation (lack of which caused the regression) simply didn’t get the consideration.

Either way, I’m not aware of any translator changes that occurred in that date range (OTOH, I haven’t been tracking the PRs during then closely either), nor can I see any LLVM-ups having been done, so I’m very stumped.

jonas-schievink

jonas-schievink commented on Jul 17, 2016

@jonas-schievink
Contributor

I'm having a hard time using git, but #33890 seems to be in nightly-2016-07-09. I originally suspected #34728, but I don't think that landed in time. At least git isn't listing it in my search. 🐳

nagisa

nagisa commented on Jul 17, 2016

@nagisa
Member

That’s a viable candidate, yes. It is quite a big change in how we translate our functions.

Is codegen-units ≠ 1 used by anybody here?

cc @michaelwoerister I wonder whether weak_odr linkage could prevent inlining or other cross-function optimisations for some reason (other than the function definition being in another unit, of course).

TimNN

TimNN commented on Jul 17, 2016

@TimNN
Contributor

@jonas-schievink: I came to the same conclusion (#33890 is the likeliest suspect, #34728 is not included in 2016-07-09).

@nagisa: I'm not setting codegen-units explicitly anywhere so I assume the cargo default of 1 is used.

nagisa

nagisa commented on Jul 18, 2016

@nagisa
Member

Yes, it seems like weak_odr prevents inlining opportunities. All of the closures ended up not being inlined due to having weak_odr – changing them to internal linkage promptly inlined all of them.

michaelwoerister

michaelwoerister commented on Jul 18, 2016

@michaelwoerister
Member

I haven't had time to look into this yet, but I'll do so shortly. If weak_odr is that bad for inlining, we should avoid it (note that there is #34830, which does so on MingW only, but could easily be adapted to do so on all platforms).

However, we had been using weak_odr for generics for months without anybody noticing that marked a performance drop and there is the internalize_symbols pass that switches symbols to internal linkage where possible (which should be the case for most closures).

michaelwoerister

michaelwoerister commented on Jul 18, 2016

@michaelwoerister
Member

I can confirm that current master is still slow compared to the nightly mentioned above (on Linux x86_64). However, comparing the LLVM IR of both versions, it does not seem likely that weak_odr linkage has something to do with the slowdown: The slower version contains only one weak_odr function -- and that isn't called anywhere.

added
I-slowIssue: Problems and improvements with respect to performance of generated code.
on Jul 18, 2016
michaelwoerister

michaelwoerister commented on Jul 18, 2016

@michaelwoerister
Member

I ignore that last comment, I was accidentally looking at the IR of the library, not of the benchmark-executable.

added a commit that references this issue on Jul 18, 2016

Auto merge of #34899 - michaelwoerister:always_internalize_symbols, r…

pmarcelll

pmarcelll commented on Jul 26, 2016

@pmarcelll
Contributor

It seems like the new nightly fixed this.

eddyb

eddyb commented on Jul 26, 2016

@eddyb
Member

@pmarcelll What nightly? https://static.rust-lang.org/dist/index.html doesn't list anything newer than 2016-07-21.

hanna-kruppe

hanna-kruppe commented on Jul 26, 2016

@hanna-kruppe
Contributor

That's strange. rustup update gave me a nightly with a commit-date of June 24 (9316ae5) and the dashboard picked up a new nightly as well.

eddyb

eddyb commented on Jul 26, 2016

@eddyb
Member

@rkruppe Ah, @alexcrichton informed me that it was a manual build and not enough was updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    I-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @sanxiyn@eddyb@nagisa@bstrie@TimNN

        Issue actions

          Performance regression in nightly · Issue #34891 · rust-lang/rust