Skip to content

rustc emits calls to __sync_ when using load/store with +forced-atomics #101300

Open
@Lokathor

Description

@Lokathor
Contributor

I went to start a new branch for the gba crate and things were going well until LLVM demanded that I add a bunch of definitions for atomic intrinsics:

  = note: arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\deps\libgba-e42d0bb0e1acbf2b.rlib(gba-e42d0bb0e1acbf2b.8ymripsqk83bpm.rcgu.o): in function `core::sync::atomic::atomic_store':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2948: undefined reference to `__sync_lock_test_and_set_4'
          arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\examples\hello-4b3a62c592de993e.5c84v82b0il7lif5.rcgu.o: in function `core::sync::atomic::atomic_store':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2948: undefined reference to `__sync_lock_test_and_set_2'
          arm-none-eabi-ld: D:\dev\gba\target\thumbv4t-none-eabi\debug\examples\hello-4b3a62c592de993e.5c84v82b0il7lif5.rcgu.o: in function `core::sync::atomic::atomic_load':
          C:\Users\Daniel\.rustup\toolchains\nightly-x86_64-pc-windows-msvc\lib\rustlib\src\rust\library\core\src\sync/atomic.rs:2963: undefined reference to `__sync_val_compare_and_swap_2'

These are all part of LLVM's sync function set, and they are probably caused by the atomic accesses I'm using:

Now the GBA is using the ARMv4T architecture, which does not have atomic instructions at the CPU level, but the +forced-atomic feature was recently added to many target definitions for older targets. My understanding of how this was supposed to work is that an aligned atomic load/store will just use the normal load/store instruction, and then the "advanced" atomic actions like fetch_add would call a function. However, it seems to be the cast that (using the thumbv4t-none-eabi target) LLVM still wants to call functions for plain load/store.

Activity

thomcc

thomcc commented on Sep 3, 2022

@thomcc
Member

Related to (possibly dupe of) #99668.

added
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
on Sep 3, 2022
nikic

nikic commented on Sep 3, 2022

@nikic
Contributor

Yeah, this shouldn't be happening. Based on the symbol names, the atomic load or store seems to get lowered to atomic CAS at some point.

nikic

nikic commented on Sep 3, 2022

@nikic
Contributor
thomcc

thomcc commented on Sep 3, 2022

@thomcc
Member

There's no __sync equivalent to plain load/store, so I believe this is to force going through libcalls on targets where you'd need to take a global lock around all access to support RMWs (which I guess LLVM assumes we might want to support).

thomcc

thomcc commented on Sep 3, 2022

@thomcc
Member

This is a lot like the current situation for compiler_fence -- on some embedded targets llvm seems to consider __sync to be a good fallback: https://github.com/llvm/llvm-project/blob/9905dae5e18cd55ee6bb8678c95fed940ded1ef9/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#L3937-L3939

nikic

nikic commented on Sep 3, 2022

@nikic
Contributor

Ah, I see the problem. These are seq_cst atomic load/stores, but apparently thumbv4t has no dmb instruction, so it's not possible to lower atomic load/store with stronger than monotonic ordering natively. Thus we get libcalls even with the +atomics-32 feature.

This does look correct to me, and it also looks like this is not new LLVM 15 behavior either. So to clarify, did this code (using these atomic orderings) work previously?

I think this means that we do need to declare atomics as completely unsupported for thumbv4t targets.

Lokathor

Lokathor commented on Sep 3, 2022

@Lokathor
ContributorAuthor

Ah, I should be more clear: This code is all new code, and I have never before even attempted to use Atomic* types on this target.

However, I was told by both Ralf and m-ou-se that plain loads are stores are "effectively atomic".

So when the Forced Atomic stuff was put in I went to try it out, and it did not work as expected. What I expected, given that it's an ARMv4T, is that all atomic orderings will lower to plain ldr and str. There's no need for dmb because there's only one CPU and it's not capable of executing operations out of order. The only thing that needs to respect the atomic orderings is LLVM itself when it's optimizing, other than that the CPU will only perform operations in exactly their order in the assembly.

What I am really after is a way for the main program (just one thread) to communicate with the interrupt handler.

  • In the past I've used volatile accesses for this
  • Doing that was called unsound in the presence of optimizations re-ordering standard accesses before or after volatile accesses
  • Accordingly, I tried this atomic stuff, and it's not working right.

So, if you think that everything here is working as far as you expect, then I might have to tell Ralf and Mo-us-e that there's no "effective atomics" after all and they need to go back to the drawing board.

Lokathor

Lokathor commented on Sep 3, 2022

@Lokathor
ContributorAuthor

On advice thomcc I tried other orderings:

  • for load/Acquire and store/Release it still calls functions.
  • for all ops Relaxed it did not call functions.
Lokathor

Lokathor commented on Sep 3, 2022

@Lokathor
ContributorAuthor
  • Using load(Acquire)/store(Release) would be correct for this code, but still emits the library calls.
  • Relaxed doesn't emit library calls, but might be reordered incorrectly (non-atomic access can move past the atomic changes).
  • Adding calls to compiler_fence to prevent that should be fine here, except that hits #62256 and becomes a library call to __sync_synchronize, which means that a 2 (load) or 3 (store) cycle operation gets 6 cycles of overhead added just to jump to an empty function and immediately return.

So yeah, atomics are completely broken on this target at the moment.

In the mean time it can be worked around with inline_asm, but then i either have to make every access a macro call (to get good register allocation) or wrap it in a method that gets inlined but then every access will always be fighting for r0.

taiki-e

taiki-e commented on Sep 4, 2022

@taiki-e
Member
  • Relaxed doesn't emit library calls

FWIW, (when you use core::sync::atomic's atomics) whether this works or not actually seems to depend on compiler optimizations. (In cases where Atomic*::{load,store} cannot be inlined enough, libcalls will still be generated: taiki-e/portable-atomic#36 (comment))

Lokathor

Lokathor commented on Sep 4, 2022

@Lokathor
ContributorAuthor

Ah yes, i hadn't considered. For this platform, I always use opt-level = 3 for all builds.

added
A-target-featureArea: Enabling/disabling target features like AVX, Neon, etc.
A-atomicArea: Atomics, barriers, and sync primitives
on Mar 4, 2023
added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-atomicArea: Atomics, barriers, and sync primitivesA-target-featureArea: Enabling/disabling target features like AVX, Neon, etc.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @nikic@thomcc@Lokathor@taiki-e@workingjubilee

        Issue actions

          rustc emits calls to `__sync_` when using load/store with `+forced-atomics` · Issue #101300 · rust-lang/rust