-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
Apologies if this has already been reported.
Let's say I have some code that looks like this (this is a simplified version of some code a friend was writing):
pub fn fast(mut ret: u64) -> u64 {
let mask = (1 << 38) - 1;
for _ in 0..100_000 {
let mut speed = 0.0;
let mut z: f64 = speed;
speed += 0.200000001;
for _ in 2..14 {
z += speed;
if (z.to_bits() >> 8) & mask == 0 {
if z % 0.0625 < 1e-13 {
println!("{}", z % 0.0625);
ret += 1;
}
}
}
}
eprintln!("ret: {ret}");
ret
}
I might be tempted to collapse the if-statement in the middle, since it shouldn't change anything - in fact, clippy will even recommend that I change it to this:
pub fn slow(mut ret: u64) -> u64 {
let mask = (1 << 38) - 1;
for _ in 0..100_000 {
let mut speed = 0.0;
let mut z: f64 = speed;
speed += 0.200000001;
for _ in 2..14 {
z += speed;
if (z.to_bits() >> 8) & mask == 0 && z % 0.0625 < 1e-13 {
println!("{}", z % 0.0625);
ret += 1;
}
}
}
eprintln!("ret: {ret}");
ret
}
However, if I pit these two against each other using criterion, then when I run a bench (on 1.69.0):
➜ cargo bench 2> out.txt
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
slow time: [7.5115 ms 7.5313 ms 7.5583 ms]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
fast time: [577.02 µs 578.91 µs 581.29 µs]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
For some reason, collapsing the if branch leads to a massive performance regression! This is surprising as well since from my testing, where I set z = 0
, the if branch should never run. Putting the two bits of code on Godbolt seems to also show that there's a bit of a difference in terms of assembly generation (fast, slow).
Furthermore, from some testing, commenting out either the eprintln
or the println
on both would result in them having similar performance.
I can set up a repo with my exact setup if that will be helpful. Repo with code and benchmark: https://github.com/ClementTsang/collapse_if_slowdown
Activity
workingjubilee commentedon May 15, 2023
The usual suspect for this codegen difference, the distinction between
&
and&&
, does not seem to be the culprit, as the following branch still gets different codegen than the original:azizghuloum commentedon May 15, 2023
Looks like rustc compiles the slow version as if you've written
I compared the mir graphs to confirm and the benchmark numbers also confirm.
That extra variable incurs an additional conditional check at runtime.
azizghuloum commentedon May 16, 2023
It seems that during THIR -> MIR lowering, there is a case for handling
&&
inside theif
, but the&&
is obscured by aUse
expression that is not handled (and gets deoptimized to a temporary variable).azizghuloum commentedon May 17, 2023
@ClementTsang with the PR that I opened, the bench output is
which is what you'd expect.
ClementTsang commentedon May 17, 2023
Nice!
Or
pattern without allocating place #111752added test case from issue rust-lang#111583