Skip to content

Collapsing two if-statements to a single if statement can result in a large performance decrease #111583

@ClementTsang

Description

@ClementTsang
Contributor

Apologies if this has already been reported.


Let's say I have some code that looks like this (this is a simplified version of some code a friend was writing):

pub fn fast(mut ret: u64) -> u64 {
    let mask = (1 << 38) - 1;

    for _ in 0..100_000 {
        let mut speed = 0.0;
        let mut z: f64 = speed;
        speed += 0.200000001;

        for _ in 2..14 {
            z += speed;

            if (z.to_bits() >> 8) & mask == 0 {
                if z % 0.0625 < 1e-13 {
                    println!("{}", z % 0.0625);
                    ret += 1;
                }
            }
        }
    }

    eprintln!("ret: {ret}");
    ret
}

I might be tempted to collapse the if-statement in the middle, since it shouldn't change anything - in fact, clippy will even recommend that I change it to this:

pub fn slow(mut ret: u64) -> u64 {
    let mask = (1 << 38) - 1;

    for _ in 0..100_000 {
        let mut speed = 0.0;
        let mut z: f64 = speed;
        speed += 0.200000001;

        for _ in 2..14 {
            z += speed;

            if (z.to_bits() >> 8) & mask == 0 && z % 0.0625 < 1e-13 {
                println!("{}", z % 0.0625);
                ret += 1;
            }
        }
    }

    eprintln!("ret: {ret}");
    ret
}

However, if I pit these two against each other using criterion, then when I run a bench (on 1.69.0):

➜ cargo bench 2> out.txt

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

slow                    time:   [7.5115 ms 7.5313 ms 7.5583 ms]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

fast                    time:   [577.02 µs 578.91 µs 581.29 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

For some reason, collapsing the if branch leads to a massive performance regression! This is surprising as well since from my testing, where I set z = 0, the if branch should never run. Putting the two bits of code on Godbolt seems to also show that there's a bit of a difference in terms of assembly generation (fast, slow).

Furthermore, from some testing, commenting out either the eprintln or the println on both would result in them having similar performance.

I can set up a repo with my exact setup if that will be helpful. Repo with code and benchmark: https://github.com/ClementTsang/collapse_if_slowdown

Activity

added
I-slowIssue: Problems and improvements with respect to performance of generated code.
on May 15, 2023
workingjubilee

workingjubilee commented on May 15, 2023

@workingjubilee
Member

The usual suspect for this codegen difference, the distinction between & and &&, does not seem to be the culprit, as the following branch still gets different codegen than the original:

            if ((z.to_bits() >> 8) & mask == 0) & (z % 0.0625 < 1e-13) {
azizghuloum

azizghuloum commented on May 15, 2023

@azizghuloum
Contributor

Looks like rustc compiles the slow version as if you've written

pub fn slower(mut ret: u64) -> u64 {
    let mask = (1 << 38) - 1;

    for _ in 0..100_000 {
        let mut speed = 0.0;
        let mut z: f64 = speed;
        speed += 0.200000001;

        for _ in 2..14 {
            z += speed;

            let tmp = (z.to_bits() >> 8) & mask == 0 && z % 0.0625 < 1e-13;
            if tmp {
                println!("{}", z % 0.0625);
                ret += 1;
            }
        }
    }

    eprintln!("ret: {ret}");
    ret
}

I compared the mir graphs to confirm and the benchmark numbers also confirm.

That extra variable incurs an additional conditional check at runtime.

azizghuloum

azizghuloum commented on May 16, 2023

@azizghuloum
Contributor

It seems that during THIR -> MIR lowering, there is a case for handling && inside the if, but the && is obscured by a Use expression that is not handled (and gets deoptimized to a temporary variable).

azizghuloum

azizghuloum commented on May 17, 2023

@azizghuloum
Contributor

@ClementTsang with the PR that I opened, the bench output is

slow                    time:   [706.99 µs 716.80 µs 727.74 µs]
                        change: [-11.173% -8.6408% -6.1600%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

fast                    time:   [700.31 µs 714.30 µs 732.70 µs]
                        change: [-5.7267% -2.7412% +0.0544%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) high mild
  12 (12.00%) high severe

which is what you'd expect.

ClementTsang

ClementTsang commented on May 17, 2023

@ClementTsang
ContributorAuthor

Nice!

added
C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing such
on Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @azizghuloum@ClementTsang@workingjubilee

      Issue actions

        Collapsing two if-statements to a single if statement can result in a large performance decrease · Issue #111583 · rust-lang/rust