Skip to content

Poor optimization of iter().skip() #101814

Closed
@Tearth

Description

@Tearth

Using iter().skip() functions leads to poor optimization compared to the manually done loop with range.
https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=b7ed8bf9e4fc3341a92f301fa5185cc5

pub fn test_1(a: [i32; 10]) -> i32 {
    let mut sum = 0;
    for v in a.iter().skip(8) {
        sum += v;
    }
    
    sum
}

pub fn test_2(a: [i32; 10]) -> i32 {
    let mut sum = 0;
    for index in 8..10 {
        sum += a[index];
    }
    
    sum
}

This produces the following asm output:

playground::test_1:
	movq	%rdi, %r8
	addq	$40, %r8
	xorl	%esi, %esi
	movl	$8, %edx
	xorl	%eax, %eax
	testb	$1, %sil
	jne	.LBB0_2

.LBB0_5:
	leaq	-1(%rdx), %rsi
	movq	%r8, %rcx
	subq	%rdi, %rcx
	shrq	$2, %rcx
	cmpq	%rsi, %rcx
	jbe	.LBB0_4
	leaq	(%rdi,%rdx,4), %rdi

.LBB0_2:
	cmpq	%r8, %rdi
	je	.LBB0_4
	testq	%rdi, %rdi
	je	.LBB0_4
	addl	(%rdi), %eax
	addq	$4, %rdi
	movb	$1, %sil
	xorl	%edx, %edx
	testb	$1, %sil
	je	.LBB0_5
	jmp	.LBB0_2

.LBB0_4:
	retq

playground::test_2:
	movl	36(%rdi), %eax
	addl	32(%rdi), %eax
	retq

Considering the zero-cost abstraction rule and the fact that the compiler knows the size of the array, it should optimize test_1 to at least the same form as test_2 where it correctly detected that we only need two values summed. Instead, there's quite a chunk of asm with lots of branches.

The issue is present both in the stable version (1.63.0) and nightly/beta channels.

Activity

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
I-slowIssue: Problems and improvements with respect to performance of generated code.
on Sep 14, 2022
MatiF100

MatiF100 commented on Sep 14, 2022

@MatiF100

Rewriting the function the following way produces the same assembly as the better optimized variant. Seems like the issue happens when using both iterators and for loop at the same time.
https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=bcb4551c22ef765a682c1d6c41eb285f

pub fn test_3(a: [i32; 10]) -> i32 {
    a.iter().skip(8).fold(0, |sum, v| sum + v)
}
nikic

nikic commented on Sep 14, 2022

@nikic
Contributor

This general class of problem is well known -- optimization of exterior iteration in Rust is very challenging. Using interior iteration (as in the previous comment) will generally optimize much better.

That said, in this case optimization is likely feasible. Looking at the IR (https://rust.godbolt.org/z/cevdKWcTn) there is a clear opportunity for peeling based on phi invariance here, which should allow follow-on optimization. Would have to investigate closer to find out why it does not trigger.

Tearth

Tearth commented on Sep 14, 2022

@Tearth
Author

Thanks, I wasn't aware that the compiler can have this kind of trouble with exterior iterations, but it's understandable - I will leave this issue open if you're saying that this case has the potential to improve.

nikic

nikic commented on Sep 27, 2022

@nikic
Contributor

I took a closer look, and the reason why this doesn't peel are multiple checks in canPeel(): https://github.com/llvm/llvm-project/blob/2769ceb0e7a4b4f11c2bf5bd21fd69c154c17ff8/llvm/lib/Transforms/Utils/LoopPeel.cpp#L88 We have a non-exiting latch here, and because of that the non-latch exits are also not terminated by unreachable. It should be possible to relax these requirements, but would need some effort to support branch weight updates.

self-assigned this
on Sep 27, 2022
nikic

nikic commented on Sep 28, 2022

@nikic
Contributor
added
E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.
on Apr 3, 2023
nikic

nikic commented on Apr 3, 2023

@nikic
Contributor

Fixed by the LLVM 16 upgrade.

added a commit that references this issue on Apr 3, 2023
73f40d4
added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Apr 5, 2023
added a commit that references this issue on Apr 11, 2023

Rollup merge of rust-lang#109895 - nikic:llvm-16-tests, r=cuviper

efb96af
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @nikic@Tearth@MatiF100@Noratrieb

    Issue actions

      Poor optimization of iter().skip() · Issue #101814 · rust-lang/rust