Skip to content

AArch64 support status and issues #3234

Open
@yuyichao

Description

@yuyichao
  • rr status

    Testing on my M1 MBA, there are currently < 30 test failures out of 1311 (40 with syscallbuf, see below).
    The main missing piece from within rr is the syscallbuf. It's actually quite tricky to implement in a way that satisfies all the requirement we have on x86 (in particular, that it should work without a valid stack....). I have a write up and a WIP implementation on this and I'll post a draft PR here after some more clean ups.
    All tests passes on apple-m1, neoverse-n1/v1, cortex-a77. Syscallbuf is implemented.

  • Supported hardware

    Currently, we support arm-neoverse-n1 and apple-m1. It seems that most of the recent arm cores up to cortex-a78 should also be supported without much issue (a55, a65, a65ae, a75-a78). I assume the upcoming apple-m2 should also work fine as well assuming it's apple-a15 based.

  • Kernel features required

    x86 currently implements three features (that I can tell) that isn't generally implementable on aarch64 without additional kernel support.

    1. Unbound CPU. This should work on aarch64 if there's a single PMU type, (or we bound it to the cores with the same PMU type). Supporting migration between PMU types would likely require kernel support due to the need for interrupt. I kind of doubt they are willing to add this but someone else with more kernel experience should bring this up...
    2. CPUID. The traditional way on aarch64 to figure out processor features and IDs is AUXV and procfs/sysfs. These should all be handled well from RR since these are normal kernel software interfaces. Recent kernel versions, however, support emulating the mrs instructions that reads the EL1 cpuid registers and AFAICT doesn't include a way for ptracer to catch it yet.
    3. Time register. Like RDTSC on x86, aarch64 has system registers like CNTVCT_EL0 that can be used as counters. (There are a few other related ones as well). There doesn't seem to be a way to trap on these from userspace ATM but at least from the architecture manual for this there should be a way for the kernel to trap this.
  • SVE/armv9-a

    SVE has a feature that I have always been worrying regarding predictability ever since it comes out. To make it easier to vectorize code with complex loop termination condition, SVE has introduced the first fault (FF) and non-fault (NF) versions of the load instructions. When accessing invalid memory with these, instead of producing a fault, these simply set a mask indicating the fault. Clever use of this would then allow vectorization of string functions (e.g. strlen) since one can perform out-of-bound read without any visible consequences.

    The issue I saw with this is that it depends on the OS paging. Even if a page is mapped from the userspace point of view, it may not actually be mapped and depend on how the kernel feel like being lazy or not. This was previously completely transparent to the userspace but now with the SVE instructions, one can in principle observe these and it is therefore something that rr has to keep track of/manage.

    It also seems that this could be worse. While we can in principle track and record what the kernel does. The arm ISA document says that

    Implementation may suppress NF load for any reason
    

    (Search for MemSingleNF). The exact behavior here is of course implementation dependent and it's of course possible that the vendors are quite reasonable here. However, that's something that at least need to be tested.

    This is relevant for any processor with SVE. The fujitsu-a64fx is probably the one with the highest hope of being able to run rr at the moment (Their PMU document doesn't mention the counter we use but the numbering of the rest agrees with the ARM PMU document so I think one need to just check if the ones we use are implemented...). This is likely going to matter more in the future since SVE and SVE2 are part of the armv9-a requirement and all future ARM processors starting from a510/a710, including neoverse-n2 and neoverse-v1 will have them (neoverse-n2 and v1 are not armv9 but n2 has SVE and v1 has SVE/SVE2). It's also perceivable that a distro would release a new version for armv9-a and binaries in it could be compiled with SVE turned out at compile time so masking off the feature may or may not work at that time...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions