Open
Description
I noted that rust does not apply some size optimizations when opt-level=z
is supplied, whereas in c
they are applied.
See here: https://godbolt.org/z/1955WjcT8
I tried this code:
#[no_mangle]
fn iterate() -> i32 {
let mut result = 0;
for i in 0..=100 {
result += i;
}
result
}
With opt-level=3
iterate:
mov eax, 5050
ret
With opt-level=z
iterate:
xor ecx, ecx
xor edx, edx
xor eax, eax
.LBB0_1:
test dl, dl
jne .LBB0_3
lea esi, [rcx + 1]
cmp ecx, 100
sete dl
cmove esi, ecx
add eax, ecx
mov ecx, esi
jmp .LBB0_1
.LBB0_3:
ret
I would expect opt-level=z
and opt-level=3
to have the same output for this fairly simple case.
In contrast, clang
15.0.0 does this:
int something() {
int result = 0;
for (int i=0; i<=100; i++) {
result += i;
}
return result;
}
with -O3
something: # @something
mov eax, 5050
ret
with -Oz
something: # @something
mov eax, 5050
ret
Meta
rustc --version --verbose
:
1.64.0 (godbolt.org), I assume that's a55dd71d5
I understand that the c code is far easier to optimize, but nevertheless the rust-produced assembly code is about 7 x as long.
Metadata
Metadata
Assignees
Labels
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
Rageking8 commentedon Sep 26, 2022
@rustbot label +T-compiler +A-codegen +I-slow
the8472 commentedon Sep 26, 2022
This is a known issue with
RangeInclusive
. Either use a regularRange
or iterate viaiter.for_each()
instead of afor _ in iter
loop.#45222
the8472 commentedon Sep 26, 2022
It's a bit surprising that 1.64 did manage to optimize it on O3 (but not O2) and then nightly and beta again even on O3.