Skip to content

Worse codegen with mem::take(vec) than on stable #103840

Closed
@clubby789

Description

@clubby789
Contributor

With this code

pub fn foo(t: &mut Vec<usize>) {
    let mut taken = std::mem::take(t);
    taken.pop();
    *t = taken;
}

Stable produces

playground::foo:
	sub	rsp, 24
	movups	xmm0, xmmword ptr [rdi]
	movaps	xmmword ptr [rsp], xmm0
	mov	rax, qword ptr [rdi + 16]
	xor	ecx, ecx
	sub	rax, 1
	cmovae	rcx, rax
	mov	qword ptr [rdi + 16], rcx
	add	rsp, 24
	ret

Whereas beta/nightly produces

playground::foo:
	push	r15
	push	r14
	push	rbx
	mov	rbx, rdi
	mov	r14, qword ptr [rdi + 8]
	mov	r15, qword ptr [rdi + 16]
	xorps	xmm0, xmm0
	movups	xmmword ptr [rdi + 8], xmm0
	mov	rsi, qword ptr [rdi + 8]
	test	rsi, rsi
	je	.LBB0_2
	shl	rsi, 3
	mov	edi, 8
	mov	edx, 8
	call	qword ptr [rip + __rust_dealloc@GOTPCREL]

.LBB0_2:
	xor	eax, eax
	sub	r15, 1
	cmovae	rax, r15
	mov	qword ptr [rbx + 8], r14
	mov	qword ptr [rbx + 16], rax
	pop	rbx
	pop	r14
	pop	r15
	ret

searched nightlies: from nightly-2022-07-02 to nightly-2022-07-03
regressed nightly: nightly-2022-07-03
searched commit range: 46b8c23...f2d9393
regressed commit: 0075bb4

bisected with cargo-bisect-rustc v0.6.4

Host triple: x86_64-unknown-linux-gnu

@rustbot label +regression-from-stable-to-nightly +A-mir-opt-inlining

Activity

added
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Nov 1, 2022
added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
I-slowIssue: Problems and improvements with respect to performance of generated code.
on Nov 1, 2022
nikic

nikic commented on Nov 1, 2022

@nikic
Contributor

Godbolt: https://rust.godbolt.org/z/4GTrh1EGx

Result IR can be further optimized by GVN, so this might be addressable on the LLVM side.

nikic

nikic commented on Nov 2, 2022

@nikic
Contributor

Looks like this got a bit worse on LLVM main because an additional assume is being preserved: https://llvm.godbolt.org/z/95eMe6j7q

Anyway, there is a phase ordering problem here. MemCpyOpt runs after GVN, and only at that point do we convert the memcpy into a memset, which makes the following load from it easy to fold.

An easy fix would probably be to support memset in InstCombine load store forwarding. But this is no longer going to fix this issue due to the aforementioned assume issue. Ugh.

nikic

nikic commented on Nov 3, 2022

@nikic
Contributor

Upstream patch for InstCombine: https://reviews.llvm.org/D137323

An alternative solution would be to move MemCpyOpt prior to GVN, but I'm not sure whether that would cause other issues.

apiraino

apiraino commented on Nov 3, 2022

@apiraino
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

added and removed
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Nov 3, 2022
nikic

nikic commented on Nov 3, 2022

@nikic
Contributor

Upstream patch for SimplifyCFG: https://reviews.llvm.org/D137339

Together these produce the following final IR:

define void @_ZN7example3foo17h9f11ae7042742a8dE(ptr noalias nocapture noundef align 8 dereferenceable(24) %t) unnamed_addr #0 personality ptr @rust_eh_personality {
start:
  %taken.sroa.6.0.t.sroa_idx = getelementptr inbounds i8, ptr %t, i64 8
  %taken.sroa.6.0.copyload5 = load i64, ptr %taken.sroa.6.0.t.sroa_idx, align 8, !alias.scope !2, !noalias !6
  %taken.sroa.7.0.t.sroa_idx = getelementptr inbounds i8, ptr %t, i64 16
  %taken.sroa.7.0.copyload6 = load i64, ptr %taken.sroa.7.0.t.sroa_idx, align 8, !alias.scope !2, !noalias !6
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %taken.sroa.6.0.t.sroa_idx, i8 0, i64 16, i1 false)
  %0 = icmp eq i64 %taken.sroa.7.0.copyload6, 0
  %1 = add i64 %taken.sroa.7.0.copyload6, -1
  %spec.select = select i1 %0, i64 0, i64 %1
  store i64 %taken.sroa.6.0.copyload5, ptr %taken.sroa.6.0.t.sroa_idx, align 8
  store i64 %spec.select, ptr %taken.sroa.7.0.t.sroa_idx, align 8
  ret void
}

Ignoring the opportunity to form a usub.sat, this is optimal.

self-assigned this
on Nov 3, 2022
clubby789

clubby789 commented on Dec 29, 2022

@clubby789
ContributorAuthor

Nightly now compiles to

example::foo:
        mov     rax, qword ptr [rdi + 16]
        xor     ecx, ecx
        sub     rax, 1
        cmovae  rcx, rax
        mov     qword ptr [rdi + 16], rcx
        ret

3 remaining items

nikic

nikic commented on Dec 29, 2022

@nikic
Contributor

Needs codegen test.

clubby789

clubby789 commented on Dec 29, 2022

@clubby789
ContributorAuthor

Would just // CHECK-NOT: __rust_dealloc work?

nikic

nikic commented on Dec 29, 2022

@nikic
Contributor

Would just // CHECK-NOT: __rust_dealloc work?

Sounds reasonable.

added a commit that references this issue on Jan 6, 2023

Auto merge of rust-lang#106272 - clubby789:codegen-test-103840, r=nikic

3cf246c
the8472

the8472 commented on Feb 16, 2023

@the8472
Member

Reopening because it working on nightly is not really reliable behavior. #106790 and #108106 both change vec field order and in each case it breaks the test.

removed
E-needs-testCall for participation: An issue has been fixed and does not reproduce, but no test has been added.
on Feb 21, 2023
added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Apr 5, 2023
the8472

the8472 commented on Apr 29, 2023

@the8472
Member

I'm no longer having issues with the codegen test, LLVM 16 upgrade seems to have made it more reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-mir-opt-inliningArea: MIR inliningI-slowIssue: Problems and improvements with respect to performance of generated code.P-mediumMedium priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-from-stable-to-nightlyPerformance or correctness regression from stable to nightly.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @nikic@the8472@apiraino@clubby789@JohnTitor

      Issue actions

        Worse codegen with `mem::take(vec)` than on stable · Issue #103840 · rust-lang/rust