Skip to content

LLVM17 performing faulty outlining for cortex-m targets causes program crash #118867

Closed
@jamesmunns

Description

@jamesmunns
Member

I tried this code:

https://github.com/peter9477/none-fault

(additional repro notes in the README there). I have verified this causes the code to crash as it jumps into a random RAM location unexpectedly.

This code does a fairly benign call to what boils down to:

format!("{:?}", None::<usize>);

However, things get sorta bad.

; This is deep inside <Option as core::fmt::Debug>::fmt
;
; Here, the vtable lookup has been outlined as OUTLINED_FUNCTION_14
;
                     LAB_000024ca                       
000024ca        00 f0 73 fb     bl         OUTLINED_FUNCTION_14

; calls into OUTLINED_FUNCTION_14
; at the time of call, LR = 24CF, which has clobbered the LR of the parent
;

    00002bb4    d1 e9 05 01     ldrd       r0,r1,[r1,#0x14]
    00002bb8    04 22           movs       r2,#0x4
    00002bba    cb 68           ldr        r3,[r1,#0xc]
    00002bbc    70 47           bx         lr

; The outlined function returns, using LR
;
; We now call the vtable method, however we don't restore LR prior to calling

000024ce        03 49           ldr        r1,[DAT_000024dc]   
000024d0        18 47           bx         r3

; This looks like it is SUPPOSED to be a tail-call function,
; and return to the caller of <Option as core::fmt::Debug>::fmt,
; but instead returns TO <Option as core::fmt::Debug>::fmt

    00000b74    80 b5           push       {r7,lr}
    00000b76    6f 46           mov        r7,sp
    00000b78    00 68           ldr        r0,[r0,#0x0]
    00000b7a    0a 44           add        r2,r1
    00000b7c    00 f0 54 f8     bl         _ZN132_$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as 
    00000b80    00 20           movs       r0,#0x0
    00000b82    80 bd           pop        {r7,pc}

; since LR was clobbered, we go BACK to 24CE. However, since
; r3 is clobbered at this point (by memcpy), we jump into program memory
; and hard fault

The disassembled version by ghidra looks like this:

void _ZN66_$LT$core..option..Option$LT$T$GT$$u20$as$u20$core..fmt..Debug$GT$3fmt17h6606eac464c97c06E
               (int *param_1,undefined4 param_2,undefined4 param_3,code *UNRECOVERED_JUMPTABLE)

{
  undefined4 uVar1; // this is r3
  
  if (*param_1 != 0) {
    OUTLINED_FUNCTION_10(0x1be9,param_2,"SomeBusyNone    ",&stack0xfffffff4,0x1be9);
    return;
  }
  uVar1 = OUTLINED_FUNCTION_14();
                    /* WARNING: Could not recover jumptable at 0x000024d0. Too many branches */
                    /* WARNING: Treating indirect jump as call */

  // NOTE(jamesmunns): We call this function, but then return back and call it again!
  (*UNRECOVERED_JUMPTABLE)(uVar1,"None    "); 
  return;
}

As far as I can tell for THIS reproduction, it:

  • DOES happen on nightly-2023-08-09 and all that I've tried later than this
  • DOES NOT happen on nightly-2023-08-08 and before
  • The project updated from LLVM16 to LLVM17 on 2023-08-08, it seems -09 is the first nightly with LLVM17
  • ONLY happens with lto = "fat", thinlto and no lto do not reproduce
    • Edit: I only observed it happening with fatlto, but Peter previously saw it with thinlto
  • ONLY happens with -Oz (edit: the repro uses -Oz for debug builds, switching to -O3 in release does not repro)

I've attached my specific elf file, so you can look at the same memory locations referenced in my issue

none-fault.elf.zip

This does seem tempermental, and tweaking unrelated pieces of the repro code causes it to disappear.

CC @peter9477 @Dirbaio

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Dec 12, 2023
jamesmunns

jamesmunns commented on Dec 12, 2023

@jamesmunns
MemberAuthor

Also pushed jamesmunns/none-fault@cc96c01 which is the exact version used to produce the attached elf file above.

jamesmunns

jamesmunns commented on Dec 12, 2023

@jamesmunns
MemberAuthor

Tagging in @fhahn, who suggested that llvm/llvm-project#73553 may address this, if we can update the LLVM17 tip.

I've not done a "rebuild rustc with different LLVM submodule" before, but I'll ask around the embedded-rust folks to see if anyone can give me a hand or try it out.

peter9477

peter9477 commented on Dec 12, 2023

@peter9477

FYI, with my original code base where this first showed up, I can confirm that in fact this happens with lto = "fat" or lto = "thin", but not with lto = "off".

changed the title [-]LTO performing faulty outlining for cortex-m targets in LLVM17[/-] [+]LLVM17 performing faulty outlining for cortex-m targets causes program crash[/+] on Dec 12, 2023
jamesmunns

jamesmunns commented on Dec 12, 2023

@jamesmunns
MemberAuthor

Yep, updated the title to point to the updated assumption that this is more likely due to LR clobbering of the outliner, rather than LTO. In my attempts, lto="thin" didn't repro, but that could have been chance more than science.

saethlin

saethlin commented on Dec 12, 2023

@saethlin
Member

ONLY happens with -Oz (edit: the repro uses -Oz for debug builds, switching to -O3 in release does not repro)

What happens with -Copt-level=s? What happens with -Copt-level=z + -Zshare-generics=no?

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
O-ArmTarget: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state
I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness
and removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Dec 12, 2023
added
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
on Dec 12, 2023
jamesmunns

jamesmunns commented on Dec 13, 2023

@jamesmunns
MemberAuthor
  • does not repro with -Copt-level=s (well, I changed the profile)
  • does repro with RUSTFLAGS='-Zshare-generics=no' cargo build edit: with opt-level='z'

I think @peter9477 might have mentioned the original code having problems in opt='s', though this repro is pretty sensitive to sometimes working with relatively minor changes.

35 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessO-ArmTarget: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 stateP-highHigh priorityllvm-fixed-upstreamIssue expected to be fixed by the next major LLVM upgrade, or backported fixesregression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @nikic@benma@Dirbaio@Emilgardis@peter9477

    Issue actions

      LLVM17 performing faulty outlining for cortex-m targets causes program crash · Issue #118867 · rust-lang/rust