Skip to content

Noticeable performance regression since the last LLVM update. #8665

Closed
@sebcrozet

Description

@sebcrozet
Contributor

I get a huge performance regression since #8328 landed (revision a8c3fe4) on all my projects. Things are 50 to 75% slower. I’m pretty sure #8328 is in cause since when I revert the compiler to the version right before (revision 67c954e) performances go back to normal.

For what it’s worth, the concerned projects are 100% generic, and rely a lot on cross-crate inlining. They do a lot of numeric computations and array indexing. Sorry if I am a bit vague but I cannot valgrind my projects because my valgrind started to segfault a few days ago (perhaps since the re-enabling of jemalloc)…

I tried to come up with a small bench exhibiting the problem. It is not that significative, but the following shows some noticeable performances regression already:

extern mod extra;

use std::hashmap;
use extra::test::BenchHarness;

#[bench]
fn bench_insert_std(bh: &mut BenchHarness) {
    let mut m = hashmap::HashMap::with_capacity(32);

    do bh.iter {
        for i in range(0u, 500) {
            m.insert((i, i), i);
        }
    }
}

#[bench]
fn bench_insert_find_remove_std(bh: &mut BenchHarness) {
    let mut m = hashmap::HashMap::with_capacity(32);

    do bh.iter {
        for i in range(0u, 200) {
            m.insert((i, i), i);
        }

        for i in range(0u, 200) {
            assert!(*m.find(&(i, i)).unwrap() == i)
        }

        for i in range(100u, 200) {
            m.remove(&(i, i));
        }

        for i in range(100u, 200) {
            assert!(m.find(&(i, i)).is_none())
        }

        for i in range(0u, 100) {
            m.insert((i, i), i * 2);
        }

        for i in range(0u, 100) {
            assert!(*m.find(&(i, i)).unwrap() == i * 2)
        }

        for i in range(0u, 100) {
            m.remove(&(i, i));
        }

        for i in range(0u, 100) {
            assert!(m.find(&(i, i)).is_none())
        }
    }
}

fn main() {
} 

Compiled with --opt-level=3.
With the (new) compiler a8c3fe4, I get:

test bench_insert_find_remove_std ... bench: 89242 ns/iter (+/- 3605)
test bench_insert_std ... bench: 46177 ns/iter (+/- 1555)

With the (old) compiler 67c954e, I get something more than 10% faster. The asm dump is smaller too:

test bench_insert_find_remove_std ... bench: 73939 ns/iter (+/- 2872)
test bench_insert_std ... bench: 38482 ns/iter (+/- 1005)

Activity

alexcrichton

alexcrichton commented on Aug 21, 2013

@alexcrichton
Member

LLVM may have updated their optimization passes, and we may just need to update what order we're optimizing things in. @thestinger, you mentioned loop vectorization as being a big thing at lower optimization levels, do you know of perhaps any analysis passes that we're missing by default?

alexcrichton

alexcrichton commented on Aug 21, 2013

@alexcrichton
Member

After talking with @thestinger on IRC, we've reached the conclusion that one major change we can make is reworking how the LLVM passes are run. I'm not certain that this is the source of the regression you're seeing, but it's the best one that I can think of related to updating LLVM.

After looking into this, we do a large number of things differently than clang

  1. We only have one pass manager (a module one), clang has 3. Clang runs a function, a module, and then a codegen pass manager. From what I can tell, this doesn't necessarily mean that we're not running optimizations, but rather we may finish optimizations more quickly by running in this order.
  2. We're not populating our pass manager with any target-specific analysis passes. Clang will populate all of its pass managers with target-specific analyses, but we don't do anything related to that.
  3. Our list of O1/2/3 optimizations fall out of date with llvm updates, so we shouldn't have them hard-coded. The hardcoded list is just a duplicate of what's already in LLVM itself anyway.
  4. There's no way for us to debug what passes are run. We need an option to debug this and see what's happening.
  5. Right now we're running two pass managers, but they're possibly overlapping with one another and it's not clear which one is responsible for which work.

Long story short, investigation into how we're running LLVM passes shows that it probably needs to be reworked and simplified to use more existing LLVM infrastructure and to match clang more closely. I'm still not certain that this is the cause of this regression, but I'll attempt to determine if it is by making these changes.

cc @Aatch, you were the one who recently added all the PassManager stuff to passes.rs, just wanted to make sure that none of this is alarming to you.

sebcrozet

sebcrozet commented on Aug 21, 2013

@sebcrozet
ContributorAuthor

Thanks for all those investigations!

I merged together a few files from my project to give you a better stand-alone example of regression.
The example is big (298 lines), and mostly unreadable; but it exhibits a slowdown greater than 85% after LLVM update. I thought it might be useful for you to have an example of big slowdown for debugging.

Here is the example: https://gist.github.com/sebcrozet/6300848

Again, compiled with --opt-level=3.
With the compiler after LLVM update (revision a8c3fe4) I get:

test test::bench_johnson_simplex ... bench: 191270 ns/iter (+/- 4553) 

With the compiler before LLVM update (revision 67c954e) I get:

test test::bench_johnson_simplex ... bench: 23890 ns/iter (+/- 94) 
alexcrichton

alexcrichton commented on Aug 23, 2013

@alexcrichton
Member

Sadly my suspicions were not correct. You're example is fantastic though, and I'm very perplexed as to what's going on with them. I did profile your code though, and I get the same 10x slowness with thew new LLVM upgrade. What's very suspicious to me are the top functions in the profile of that code. Sorry, but I haven't figured out yet how to get profiles on OSX not in instruments...

Before the llvm upgrades (stage0 today):
screen shot 2013-08-22 at 9 30 44 pm

After #8700 (with llvm upgrades and refactored pass handling)
screen shot 2013-08-22 at 9 30 55 pm

The exact test I ran was the one above, but with the iterations turned up to 1000 to take a bit longer. I'm very confused why the timing information is showing up so massively in the profiles of one, so I decided to run with different code.

With this code, I get the following profiles

Before:
screen shot 2013-08-22 at 9 35 12 pm
After:
screen shot 2013-08-22 at 9 35 19 pm

I'm disturbed by the fact that there are transmutes in the profile, but there shouldn't be any timing information weirdness going on here. I'll continue investigating later, but I wanted to post my findings here before I forgot them.

graydon

graydon commented on Aug 23, 2013

@graydon
Contributor

Transmute showing up means a change in inlining.

thestinger

thestinger commented on Aug 23, 2013

@thestinger
Contributor

I identified the root cause of this issue, so I'm closing this in favour of #8720.

added a commit that references this issue on Apr 21, 2022

Auto merge of rust-lang#8665 - InfRandomness:option_take_on_temporary…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @graydon@alexcrichton@thestinger@sebcrozet

      Issue actions

        Noticeable performance regression since the last LLVM update. · Issue #8665 · rust-lang/rust