Description
Hi folks. Chatted a bit on IRC, seemed to think this wasn't obviously a dup, so reporting here.
I'm using write_all
to push some bytes into a buffer. If I do this in-line, all goes well performance-wise (memcpy speeds; about 50GB/s on my machine). If I put it in a method, even with a #[inline(always)]
attribute, it drops down to about 1GB/s (and assembly looks like a loop doing something).
The problem goes away if I don't push the leading 24 bytes on using write_all
. Meaning, if I don't push them on, great! If I call push(0u8);
24 times, also great! Something about the existence of the preceding write_all
seems to tank the perf of the second write_all
(the big one). If I push 32 bytes (i.e. use a &[0u8; 32]
) the problem goes away as well (quadword alignment?).
But there never seems to be a problem with the manually inlined code; it always goes nice and fast.
extern crate time;
use std::io::Write;
fn main() {
let dataz = vec![0u8; 1 << 20];
let mut bytes = Vec::new();
let rounds = 1_000;
let start = time::precise_time_ns();
for _ in 0..rounds {
bytes.clear();
// these two: "average time: 81135"
// bytes.write_all(&[0u8; 24]).unwrap();
// bytes.write_all(&dataz[..]).unwrap();
// this one: "average time: 530736"
test(&dataz, &mut bytes)
}
println!("average time: {:?}", (time::precise_time_ns() - start) / rounds);
}
#[inline(always)]
fn test(typed: &Vec<u8>, bytes: &mut Vec<u8>) {
// comment first line out to go fast!
// weirdly, to me: if you replace the first line with 24x `bytes.push(0u8)` you get good performance.
bytes.write_all(&[0u8; 24]).unwrap();
bytes.write_all(&typed[..]).unwrap();
}
Edit: stable, beta, and nightly.
Activity
eefriedman commentedon May 9, 2016
It looks like this is running into serious aliasing problems in the optimizer. This becomes a lot more obvious if you change the
inline(always)
toinline(never)
in your testcase. rustc could be a bit smarter about inserting aliasing hints into IR; #31681 in particular would be helpful here. LLVM could also be smarter about handling the IR for a testcase like this; alias analysis is doing a terrible job, and loop unrolling is making things worse rather than better.I think with specialization and rust-lang/rfcs#1521, extend_from_slice could be fixed to explicitly call memcpy, which would make testcases like this much less sensitive to the optimizer.
frankmcsherry commentedon May 9, 2016
Ah, I thought
write_all
did just call memcpy. I chose it a while back overextend
because it was compiling down to that. At least, when I first did the benchmarking with it, it worked quite well. I guess when #31545, which #31681 points at, says thatthey didn't test ... copying lots of memory around. >.<
Is there a memcpy-positive form of
write
, or ... do I allocate enough memory and docopy_nonoverlapping
... ? :D Or maybe just chill out and wait for LLVM to fix their noalias bug?bluss commentedon May 9, 2016
It reminds me of #32155, but I have not confirmed the loop optimization failure is of the exact same kind here as there. If it is, that loop optimization regression is worrying in general, and it's "not enough" to work around it with specialization.
eefriedman commentedon May 9, 2016
It looks like the same thing to me: probably affects all tight loops of
ptr::write()
followed byVec::set_len()
where the optimizer can't figure out the aliasing.frankmcsherry commentedon May 10, 2016
I've updated my code to use
copy_nonoverlapping
and recovered the performance. I'm happy to either close this as a dup, or keep it open if the specific example is helpful for eventually testing perf recovery when a #31681 fix lands.[-]Performance issue in `write_all`[/-][+]Performance issue in `write_all` (`Vec::extend_from_slice`)[/+]arielb1 commentedon Sep 8, 2016
This can be fixed by smarter handling of
len
: https://play.rust-lang.org/?gist=69bc48e8afc3750ef2e73fec178ccdb5&version=stable&backtrace=0bluss commentedon Sep 8, 2016
Yep, that's the gist I posted to show my WIP workaround for #32155
Option
fromBufWriter
#36336Auto merge of #36355 - bluss:vec-extend-from-slice-aliasing-workaroun…
1 remaining item