Skip to content

Incomplete optimization with opt-level=z compared to clang for possible pre-compiled expressions #102312

Open
@arctic-penguin

Description

@arctic-penguin

I noted that rust does not apply some size optimizations when opt-level=z is supplied, whereas in c they are applied.

See here: https://godbolt.org/z/1955WjcT8

I tried this code:

#[no_mangle]
fn iterate() -> i32 {
    let mut result = 0;
    for i in 0..=100 {
        result += i;
    }
    result
}

With opt-level=3

iterate:
        mov     eax, 5050
        ret

With opt-level=z

iterate:
        xor     ecx, ecx
        xor     edx, edx
        xor     eax, eax
.LBB0_1:
        test    dl, dl
        jne     .LBB0_3
        lea     esi, [rcx + 1]
        cmp     ecx, 100
        sete    dl
        cmove   esi, ecx
        add     eax, ecx
        mov     ecx, esi
        jmp     .LBB0_1
.LBB0_3:
        ret

I would expect opt-level=z and opt-level=3 to have the same output for this fairly simple case.

In contrast, clang 15.0.0 does this:

int something() {
    int result = 0;
    for (int i=0; i<=100; i++) {
        result += i;
    }
    return result;
}

with -O3

something:                              # @something
        mov     eax, 5050
        ret

with -Oz

something:                              # @something
        mov     eax, 5050
        ret

Meta

rustc --version --verbose:

1.64.0 (godbolt.org), I assume that's a55dd71d5

I understand that the c code is far easier to optimize, but nevertheless the rust-produced assembly code is about 7 x as long.

Activity

Rageking8

Rageking8 commented on Sep 26, 2022

@Rageking8
Contributor

@rustbot label +T-compiler +A-codegen +I-slow

added
A-codegenArea: Code generation
I-slowIssue: Problems and improvements with respect to performance of generated code.
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Sep 26, 2022
the8472

the8472 commented on Sep 26, 2022

@the8472
Member

This is a known issue with RangeInclusive. Either use a regular Range or iterate via iter.for_each() instead of a for _ in iter loop.

#45222

the8472

the8472 commented on Sep 26, 2022

@the8472
Member

It's a bit surprising that 1.64 did manage to optimize it on O3 (but not O2) and then nightly and beta again even on O3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generationC-bugCategory: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @the8472@rustbot@arctic-penguin@Rageking8

        Issue actions

          Incomplete optimization with opt-level=z compared to clang for possible pre-compiled expressions · Issue #102312 · rust-lang/rust