Skip to content

Rust should use registers more aggressively #26494

Closed
@reinerp

Description

@reinerp

Rust should pass more structs in registers. Consider these examples: ideally, both functions would execute entirely in registers and wouldn't touch memory:

// Passing small structs by value.
pub fn parameters_by_value(v: (u64, u64)) -> u64 {                                                                                                                                                          
  v.0 + v.1
}

// Returning small structs by value.
pub fn return_by_value() -> (u64, u64) {
  (3, 4)
}

Rust, as of a recent 1.2.0-dev nightly, is unable to pass either of these in registers (see LLVM IR and ASM below). It would be pretty safe to pass and return small structs (ones that fit into <=2 registers) in registers, and is likely to improve performance on average. This is what the System V ABI does.

It would also be nice to exploit Rust's control over aliasing, and where possible also promote reference arguments to registers, i.e. put the u64 values in registers for the following functions:

// Passing small structs by reference.
pub fn parameters_by_ref(v: &(u64, u64)) -> u64 {                                                                                                                                                           
  v.0 + v.1
}

// Passing small structs by *mutable* reference.                                                                                                                                                            
pub fn mutable_parameters_by_ref(v: &mut (u64, u64)) {                                                                                                                                                      
  v.0 += 1;
  v.1 += 2;
}

In the &mut case, this would mean passing two u64 values in registers as function parameters, and returning two u64 values in registers as the return values (ideally we'd arrange for the parameter registers to match the return registers). Uniqueness of &mut makes this optimization valid, although we may have to give up on this optimization in cases such as when there are raw pointers present.

Here's a more realistic example where I've wanted Rust to do this:

pub fn skip_whitespace(iter: &mut std::str::Chars) -> u64 {
  // Reads as much whitespace as possible from the front of iter, then returns the number of
  // characters read.                                                                                                                                                                                       
  ...
}

This function is too large to justify inlining. I'd like the begin and end pointers of iter to be kept in registers across the function call.

Probably not surprising to the compiler team, but for completeness here is the LLVM IR of the above snippets, as of today's Rust (1.2.0-dev), compiled in release mode / opt-level=3:

define i64 @_ZN19parameters_by_value20h3d287104250c57c4eaaE({ i64, i64 }* noalias nocapture dereferenceable(16)) unnamed_addr #0 {
entry-block:
  %1 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i64 0, i32 0
  %2 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i64 0, i32 1
  %3 = load i64, i64* %1, align 8
  %4 = load i64, i64* %2, align 8
  %5 = add i64 %4, %3
  %6 = bitcast { i64, i64 }* %0 to i8*
  tail call void @llvm.lifetime.end(i64 16, i8* %6)
  ret i64 %5
}

define void @_ZN15return_by_value20h703d16a2e5f298d6saaE({ i64, i64 }* noalias nocapture sret dereferenceable(16)) unnamed_addr #0 {
entry-block:
  %1 = bitcast { i64, i64 }* %0 to i8*
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* bitcast ({ i64, i64 }* @const1285 to i8*), i64 16, i32 8, i1 false)
  ret void
}

define i64 @_ZN17parameters_by_ref20hc9c548b23d173a1aBaaE({ i64, i64 }* noalias nocapture readonly dereferenceable(16)) unnamed_addr #2 {
entry-block:
  %1 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i64 0, i32 0
  %2 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i64 0, i32 1
  %3 = load i64, i64* %1, align 8
  %4 = load i64, i64* %2, align 8
  %5 = add i64 %4, %3
  ret i64 %5
}

define void @_ZN25mutable_parameters_by_ref20h736bc2daba227c43QaaE({ i64, i64 }* noalias nocapture dereferenceable(16)) unnamed_addr #0 {
entry-block:
  %1 = bitcast { i64, i64 }* %0 to <2 x i64>*
  %2 = load <2 x i64>, <2 x i64>* %1, align 8
  %3 = add <2 x i64> %2, <i64 1, i64 2>
  %4 = bitcast { i64, i64 }* %0 to <2 x i64>*
  store <2 x i64> %3, <2 x i64>* %4, align 8
  ret void
}

and here is the ASM:

_ZN19parameters_by_value20h3d287104250c57c4eaaE:
        .cfi_startproc
        movq    8(%rdi), %rax
        addq    (%rdi), %rax
        retq

_ZN15return_by_value20h703d16a2e5f298d6saaE:
        .cfi_startproc
        movups  const1285(%rip), %xmm0
        movups  %xmm0, (%rdi)
        movq    %rdi, %rax
        retq

_ZN17parameters_by_ref20hc9c548b23d173a1aBaaE:
        .cfi_startproc
        movq    8(%rdi), %rax
        addq    (%rdi), %rax
        retq

_ZN25mutable_parameters_by_ref20h736bc2daba227c43QaaE:
        .cfi_startproc
        movdqu  (%rdi), %xmm0
        paddq   .LCPI3_0(%rip), %xmm0
        movdqu  %xmm0, (%rdi)
        retq

Activity

dotdash

dotdash commented on Jun 22, 2015

@dotdash
Contributor

For fat pointers this has been fixed in #26411
Am 22.06.2015 07:58 schrieb "Reiner Pope" notifications@github.com:

Rust should pass more structs in registers. Consider these examples:
ideally, both functions would execute entirely in registers and wouldn't
touch memory:

// Passing small structs by value.
pub fn parameters_by_value(v: (u64, u64)) -> u64 {
v.0 + v.1
}

// Returning small structs by value.
pub fn return_by_value() -> (u64, u64) {
(3, 4)
}

Rust, as of a recent 1.2.0-dev nightly, is unable to pass either of these
in registers (see LLVM IR and ASM below). It would be pretty safe to pass
and return small structs (ones that fit into <=2 registers) in registers,
and is likely to improve performance on average. This is what the System V
ABI does.

It would also be nice to exploit Rust's control over aliasing, and where
possible also promote reference arguments to registers, i.e. put the u64
values in registers for the following functions:

// Passing small structs by reference.
pub fn parameters_by_ref(v: &(u64, u64)) -> u64 {
v.0 + v.1
}

// Passing small structs by mutable reference.
pub fn mutable_parameters_by_ref(v: &mut (u64, u64)) {
v.0 += 1;
v.1 += 2;
}

In the &mut case, this would mean passing two u64 values in registers as
function parameters, and returning two u64 values in registers as the
return values (ideally we'd arrange for the parameter registers to match
the return registers). Uniqueness of &mut makes this optimization valid,
although we may have to give up on this optimization in cases such as when
there are raw pointers present.

Here's a more realistic example where I've wanted Rust to do this:

pub fn skip_whitespace(iter: &mut std::str::Chars) -> u64 {
// Reads as much whitespace as possible from the front of iter, then returns the number of
// characters read.
...
}

This function is too large to justify inlining. I'd like the begin and end
pointers of iter to be kept in registers across the function call.

Probably not surprising to the compiler team, but for completeness here is
the LLVM IR of the above snippets, as of today's Rust (1.2.0-dev), compiled
in release mode / opt-level=3:

define i64 @_ZN19parameters_by_value20h3d287104250c57c4eaaE({ i64, i64 }* noalias nocapture dereferenceable(16)) unnamed_addr #0 {
entry-block:
%1 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i64 0, i32 0
%2 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i64 0, i32 1
%3 = load i64, i64* %1, align 8
%4 = load i64, i64* %2, align 8
%5 = add i64 %4, %3
%6 = bitcast { i64, i64 }* %0 to i8*
tail call void @llvm.lifetime.end(i64 16, i8* %6)
ret i64 %5
}

define void @_ZN15return_by_value20h703d16a2e5f298d6saaE({ i64, i64 }* noalias nocapture sret dereferenceable(16)) unnamed_addr #0 {
entry-block:
%1 = bitcast { i64, i64 }* %0 to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* bitcast ({ i64, i64 }* @const1285 to i8*), i64 16, i32 8, i1 false)
ret void
}

define i64 @_ZN17parameters_by_ref20hc9c548b23d173a1aBaaE({ i64, i64 }* noalias nocapture readonly dereferenceable(16)) unnamed_addr #2 {
entry-block:
%1 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i64 0, i32 0
%2 = getelementptr inbounds { i64, i64 }, { i64, i64 }* %0, i64 0, i32 1
%3 = load i64, i64* %1, align 8
%4 = load i64, i64* %2, align 8
%5 = add i64 %4, %3
ret i64 %5
}

define void @_ZN25mutable_parameters_by_ref20h736bc2daba227c43QaaE({ i64, i64 }* noalias nocapture dereferenceable(16)) unnamed_addr #0 {
entry-block:
%1 = bitcast { i64, i64 }* %0 to <2 x i64>*
%2 = load <2 x i64>, <2 x i64>* %1, align 8
%3 = add <2 x i64> %2, <i64 1, i64 2>
%4 = bitcast { i64, i64 }* %0 to <2 x i64>*
store <2 x i64> %3, <2 x i64>* %4, align 8
ret void
}

and here is the ASM:

_ZN19parameters_by_value20h3d287104250c57c4eaaE:
.cfi_startproc
movq 8(%rdi), %rax
addq (%rdi), %rax
retq

_ZN15return_by_value20h703d16a2e5f298d6saaE:
.cfi_startproc
movups const1285(%rip), %xmm0
movups %xmm0, (%rdi)
movq %rdi, %rax
retq

_ZN17parameters_by_ref20hc9c548b23d173a1aBaaE:
.cfi_startproc
movq 8(%rdi), %rax
addq (%rdi), %rax
retq

_ZN25mutable_parameters_by_ref20h736bc2daba227c43QaaE:
.cfi_startproc
movdqu (%rdi), %xmm0
paddq .LCPI3_0(%rip), %xmm0
movdqu %xmm0, (%rdi)
retq


Reply to this email directly or view it on GitHub
#26494.

arielb1

arielb1 commented on Jun 22, 2015

@arielb1
Contributor

On skip_whitespace etc. this is currently impossible because the address of references is significant (you can e.g. print it with println!("{:?}", iter as *const _)).

self-assigned this
on Jun 22, 2015
reinerp

reinerp commented on Jun 22, 2015

@reinerp
Author

@arielb1 - yes, the ability to take addresses prevents doing this for all functions. But you could see doing this as an optimization: make the compiler check whether a function uses the address of a reference, and if not then change its calling convention. The compiler can use the worker/wrapper technique (as in GHC) to support call sites which aren't aware of this change in calling convention, by splitting the function into two:

// Non-inlined worker, uses fast in-register calling convention
pub fn skip_whitespace_worker(start: u8*, end: u8*) -> (u8*, u8*, u64) {
  ...
}

// Inlined wrapper, moves reference argument to registers and then calls the worker.
pub fn skip_whitespace(v: &mut std::str::Chars) -> u64 {
  // Approximately:
  let (start, end, len) = skip_whitespace_worker(v.start, v.end);
  mut.start = start;
  mut.end = end;
  len
}
pcwalton

pcwalton commented on Jul 20, 2015

@pcwalton
Contributor

Don't do this. You will break FastISel. Fix LLVM if it's not doing the optimizations you want.

dotdash

dotdash commented on Jul 20, 2015

@dotdash
Contributor

@pcwalton what exactly would break FastISel?

removed their assignment
on Mar 4, 2016
added
C-enhancementCategory: An issue proposing an enhancement or a PR with one.
I-slowIssue: Problems and improvements with respect to performance of generated code.
on Jul 22, 2017
nox

nox commented on Apr 2, 2018

@nox
Contributor

Pretty sure that since @eddyb's work on niche-filling optimisation, parameters_by_value and return_by_value use a pair of registers for the argument and the return value, respectively.

Cc @rust-lang/wg-codegen

gnzlbg

gnzlbg commented on Sep 14, 2019

@gnzlbg
Contributor

@eddyb didn't Abi::ScalarPair solve this?

gnzlbg

gnzlbg commented on Sep 14, 2019

@gnzlbg
Contributor

Yes, this appears to be fixed. The following Rust code (https://rust.godbolt.org/z/LhQGOM):

// Passing small structs by value.
pub fn parameters_by_value(v: (u64, u64)) -> u64 {                                                                                                                                                          
  v.0 + v.1
}

// Returning small structs by value.
pub fn return_by_value() -> (u64, u64) {
  (3, 4)
}

generates

example::parameters_by_value:
        lea     rax, [rdi + rsi]
        ret

example::return_by_value:
        mov     eax, 3
        mov     edx, 4
        ret

and

define i64 @parameters_by_value(i64 %v.0, i64 %v.1) unnamed_addr #0 {
start:
  %0 = add i64 %v.1, %v.0
  ret i64 %0
}

define { i64, i64 } @return_by_value() unnamed_addr #0 {
start:
  ret { i64, i64 } { i64 3, i64 4 }
}

21 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generationC-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @nox@pcwalton@dotdash@reinerp@gnzlbg

      Issue actions

        Rust should use registers more aggressively · Issue #26494 · rust-lang/rust