Skip to content

Performance issue in write_all (Vec::extend_from_slice) #33518

Closed
@frankmcsherry

Description

@frankmcsherry
Contributor

Hi folks. Chatted a bit on IRC, seemed to think this wasn't obviously a dup, so reporting here.

I'm using write_all to push some bytes into a buffer. If I do this in-line, all goes well performance-wise (memcpy speeds; about 50GB/s on my machine). If I put it in a method, even with a #[inline(always)] attribute, it drops down to about 1GB/s (and assembly looks like a loop doing something).

The problem goes away if I don't push the leading 24 bytes on using write_all. Meaning, if I don't push them on, great! If I call push(0u8); 24 times, also great! Something about the existence of the preceding write_all seems to tank the perf of the second write_all (the big one). If I push 32 bytes (i.e. use a &[0u8; 32]) the problem goes away as well (quadword alignment?).

But there never seems to be a problem with the manually inlined code; it always goes nice and fast.

extern crate time;

use std::io::Write;

fn main() {

    let dataz = vec![0u8; 1 << 20];
    let mut bytes = Vec::new();

    let rounds = 1_000;
    let start = time::precise_time_ns();

    for _ in 0..rounds {
        bytes.clear();

        // these two: "average time: 81135"
        // bytes.write_all(&[0u8; 24]).unwrap();
        // bytes.write_all(&dataz[..]).unwrap();

        // this one: "average time: 530736"
        test(&dataz, &mut bytes)
    }

    println!("average time: {:?}", (time::precise_time_ns() - start) / rounds);

}

#[inline(always)]
fn test(typed: &Vec<u8>, bytes: &mut Vec<u8>) {
    // comment first line out to go fast!
    // weirdly, to me: if you replace the first line with 24x `bytes.push(0u8)` you get good performance.
    bytes.write_all(&[0u8; 24]).unwrap();
    bytes.write_all(&typed[..]).unwrap();
}

Edit: stable, beta, and nightly.

Activity

eefriedman

eefriedman commented on May 9, 2016

@eefriedman
Contributor

It looks like this is running into serious aliasing problems in the optimizer. This becomes a lot more obvious if you change the inline(always) to inline(never) in your testcase. rustc could be a bit smarter about inserting aliasing hints into IR; #31681 in particular would be helpful here. LLVM could also be smarter about handling the IR for a testcase like this; alias analysis is doing a terrible job, and loop unrolling is making things worse rather than better.

I think with specialization and rust-lang/rfcs#1521, extend_from_slice could be fixed to explicitly call memcpy, which would make testcases like this much less sensitive to the optimizer.

frankmcsherry

frankmcsherry commented on May 9, 2016

@frankmcsherry
ContributorAuthor

Ah, I thought write_all did just call memcpy. I chose it a while back over extend because it was compiling down to that. At least, when I first did the benchmarking with it, it worked quite well. I guess when #31545, which #31681 points at, says that

Benchmarks suggest that the performance loss by this change is very small.

they didn't test ... copying lots of memory around. >.<

Is there a memcpy-positive form of write, or ... do I allocate enough memory and do copy_nonoverlapping ... ? :D Or maybe just chill out and wait for LLVM to fix their noalias bug?

bluss

bluss commented on May 9, 2016

@bluss
Member

It reminds me of #32155, but I have not confirmed the loop optimization failure is of the exact same kind here as there. If it is, that loop optimization regression is worrying in general, and it's "not enough" to work around it with specialization.

eefriedman

eefriedman commented on May 9, 2016

@eefriedman
Contributor

It looks like the same thing to me: probably affects all tight loops of ptr::write() followed by Vec::set_len() where the optimizer can't figure out the aliasing.

frankmcsherry

frankmcsherry commented on May 10, 2016

@frankmcsherry
ContributorAuthor

I've updated my code to use copy_nonoverlapping and recovered the performance. I'm happy to either close this as a dup, or keep it open if the specific example is helpful for eventually testing perf recovery when a #31681 fix lands.

changed the title [-]Performance issue in `write_all`[/-] [+]Performance issue in `write_all` (`Vec::extend_from_slice`)[/+] on May 27, 2016
added
T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.
on Sep 8, 2016
arielb1

arielb1 commented on Sep 8, 2016

@arielb1
Contributor
bluss

bluss commented on Sep 8, 2016

@bluss
Member

Yep, that's the gist I posted to show my WIP workaround for #32155

added a commit that references this issue on Sep 12, 2016

Auto merge of #36355 - bluss:vec-extend-from-slice-aliasing-workaroun…

4d91323

1 remaining item

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @arielb1@bluss@frankmcsherry@apasel422@eefriedman

        Issue actions

          Performance issue in `write_all` (`Vec::extend_from_slice`) · Issue #33518 · rust-lang/rust