Description
memclr()
/memset()
calls can be emitted, e.g. if Vec
is contained in another structure (even if its position is last):
use heapless::Vec;
pub struct S {
x: u32,
v: Vec<u8, 28>,
}
pub fn s2() -> S {
S { x: 0, v: Vec::new() }
}
example::s2:
push {r7, lr}
mov r7, sp
movs r1, #36
bl __aeabi_memclr4
pop {r7, pc}
Complete code to reproduce with a couple variations
#![no_std]
#![allow(unused)]
// Vec taken (minus comments) from
// https://github.com/japaric/heapless/blob/14639156894125d75b908405cc228eb4b550c85d/src/vec.rs#L36-L72
use core::mem::MaybeUninit;
pub struct Vec<T, const N: usize> {
len: usize,
buffer: [MaybeUninit<T>; N],
}
impl<T, const N: usize> Vec<T, N> {
const ELEM: MaybeUninit<T> = MaybeUninit::uninit();
const INIT: [MaybeUninit<T>; N] = [Self::ELEM; N];
pub const fn new() -> Self {
Self {
len: 0,
buffer: Self::INIT,
}
}
}
pub struct S {
x: u32,
v: Vec<u8, 28>,
}
pub struct P {
x: u8,
v: Vec<u8, 32>,
}
/// This is the only case (of the four shown below) that does not clear/set the entire Vec
pub fn s1() -> S {
S { x: 1, v: Vec::new() }
}
pub fn s2() -> S {
S { x: 0, v: Vec::new() }
}
pub fn p1() -> P {
P { x: 1, v: Vec::new() }
}
pub fn p2() -> P {
P { x: 0, v: Vec::new() }
}
Result:
example::s1:
movs r1, #0
movs r2, #1
strd r2, r1, [r0]
bx lr
example::s2:
push {r7, lr}
mov r7, sp
movs r1, #36
bl __aeabi_memclr4
pop {r7, pc}
example::p1:
push {r4, r6, r7, lr}
add r7, sp, #8
mov r4, r0
adds r0, #4
movs r1, #33
movs r2, #1
bl __aeabi_memset4
movs r0, #0
str r0, [r4]
pop {r4, r6, r7, pc}
example::p2:
push {r7, lr}
mov r7, sp
movs r1, #37
bl __aeabi_memclr4
pop {r7, pc}
When hunting down a few unexpected and expensive memclr()
/memset()
I noticed that they are generated for Vec::new()
.
This is even though Vec
almost exclusively consists of MaybeUninit::uninit()
and also specifically tunes its layout so that the only initialized element (the length) is early. The optimization mentioned in the code comment to #290 does appear to be very desirable but also very easy to miss out on.
From a cursory search the result is the same for all rust versions between current nightly and 1.56. With 1.55 all four cases receive memset()
/memclr()
treatment. This is -Copt-level=3 --target=thumbv7em-none-eabihf
. I didn't try other targets or optimization levels.
This is probably not a bug in heapless
, and there may not even be a viable way to improve the situation even from the rust/llvm perspective. But it looks like a surprising data point and was certainly a pitfall in my case.
Under what circumstances can the compiler be expected to actually forego the initialization of MaybeUninit::uninit()
?
Or are we using it (Vec
or MaybeUninit
) wrong?