Skip to content

memset()/memclr() emitted for Vec::new() #305

Closed
@jordens

Description

@jordens

memclr()/memset() calls can be emitted, e.g. if Vec is contained in another structure (even if its position is last):

use heapless::Vec;

pub struct S {
    x: u32,
    v: Vec<u8, 28>,
}

pub fn s2() -> S {
    S { x: 0, v: Vec::new() }
}
example::s2:
        push    {r7, lr}
        mov     r7, sp
        movs    r1, #36
        bl      __aeabi_memclr4
        pop     {r7, pc}

Godbolt

Complete code to reproduce with a couple variations
#![no_std]
#![allow(unused)]

// Vec taken (minus comments) from
// https://github.com/japaric/heapless/blob/14639156894125d75b908405cc228eb4b550c85d/src/vec.rs#L36-L72
use core::mem::MaybeUninit;

pub struct Vec<T, const N: usize> {
    len: usize,
    buffer: [MaybeUninit<T>; N],
}

impl<T, const N: usize> Vec<T, N> {
    const ELEM: MaybeUninit<T> = MaybeUninit::uninit();
    const INIT: [MaybeUninit<T>; N] = [Self::ELEM; N];

    pub const fn new() -> Self {
        Self {
            len: 0,
            buffer: Self::INIT,
        }
    }

}

pub struct S {
    x: u32,
    v: Vec<u8, 28>,
}

pub struct P {
    x: u8,
    v: Vec<u8, 32>,
}

/// This is the only case (of the four shown below) that does not clear/set the entire Vec
pub fn s1() -> S {
    S { x: 1, v: Vec::new() }
}

pub fn s2() -> S {
    S { x: 0, v: Vec::new() }
}

pub fn p1() -> P {
    P { x: 1, v: Vec::new() }
}

pub fn p2() -> P {
    P { x: 0, v: Vec::new() }
}

Result:

example::s1:
        movs    r1, #0
        movs    r2, #1
        strd    r2, r1, [r0]
        bx      lr

example::s2:
        push    {r7, lr}
        mov     r7, sp
        movs    r1, #36
        bl      __aeabi_memclr4
        pop     {r7, pc}

example::p1:
        push    {r4, r6, r7, lr}
        add     r7, sp, #8
        mov     r4, r0
        adds    r0, #4
        movs    r1, #33
        movs    r2, #1
        bl      __aeabi_memset4
        movs    r0, #0
        str     r0, [r4]
        pop     {r4, r6, r7, pc}

example::p2:
        push    {r7, lr}
        mov     r7, sp
        movs    r1, #37
        bl      __aeabi_memclr4
        pop     {r7, pc}

When hunting down a few unexpected and expensive memclr()/memset() I noticed that they are generated for Vec::new().
This is even though Vec almost exclusively consists of MaybeUninit::uninit() and also specifically tunes its layout so that the only initialized element (the length) is early. The optimization mentioned in the code comment to #290 does appear to be very desirable but also very easy to miss out on.

From a cursory search the result is the same for all rust versions between current nightly and 1.56. With 1.55 all four cases receive memset()/memclr() treatment. This is -Copt-level=3 --target=thumbv7em-none-eabihf. I didn't try other targets or optimization levels.

This is probably not a bug in heapless, and there may not even be a viable way to improve the situation even from the rust/llvm perspective. But it looks like a surprising data point and was certainly a pitfall in my case.

Under what circumstances can the compiler be expected to actually forego the initialization of MaybeUninit::uninit()?
Or are we using it (Vec or MaybeUninit) wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions