Skip to content

Tracking Issue for Iterator::next_chunk #98326

Open
@rossmacarthur

Description

@rossmacarthur
Contributor

Feature gate: #![feature(iter_next_chunk)]

This is a tracking issue for the .next_chunk() method on iterators which allows you to advance the iterator N elements and return an array [T; N]

Public API

use core::array;

trait Iterator {
    type Item;
    
    fn next_chunk<const N: usize>(&mut self,) -> Result<[Self::Item; N], array::IntoIter<Self::Item, N>>
    where
        Self: Sized;
}

Steps / History

Unresolved Questions

  • Naming: other options include next_array() or next_array_chunk().
  • Should we also add next_chunk_back to DoubleEndedIterator?
  • How should we handle N = 0?

Activity

added
C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFC
T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.
on Jun 21, 2022
ChayimFriedman2

ChayimFriedman2 commented on Jun 21, 2022

@ChayimFriedman2
Contributor

Should this method support N=0? itertools does not, and it sounds plausible; however, if we want to do the same, we will have the same concern as <[T]>::array_chunks() that cannot be stabilized until generic_const_exprs or some subset will be stabilized.

the8472

the8472 commented on Jun 25, 2022

@the8472
Member

I'm wondering about the intended uses of this API.

If it is for performance (autovectorization) then it seems fairly brittle. I just tried doing this benchmark:

#[bench]
fn bench_next_chunk(b: &mut Bencher) {
    let v = vec![13u8; 2048];

    b.iter(|| {
        const CHUNK: usize = 8;

        let mut sum = [0u32; CHUNK];
        let mut iter = black_box(v.clone()).into_iter();

        while let Ok(chunk) = iter.next_chunk::<CHUNK>() {
            for i in 0..CHUNK {
                sum[i] += chunk[i] as u32;
            }
        }

        sum
    })
}

and it took 17,272 ns/iter when compiled with target-cpu=znver2. With a handrolled implementation I got 211 ns/iter. I encountered some 10x slowdowns in a few other combinations. Sometimes codegen-units=1 produced even worse assembly because vectorizer did silly things such as loading one byte at a time into a simd register with vpinsrb.

And that's with direct, linear memory access on a Vec::IntoIter which should be relatively straight-forward to turn into masked simd loads. Many iterators yield references. A [&T; N] would need a gather-load which is expensive and only available on newer CPUs, this is unlike slice::array_chunks which yields &[T; N]. We could try specializing slice.iter().next_chunk() but then you might as well use array_chunks. And of course it'll get more complex with additional adapters.

Another limitation is that it can't be used in a for-in loop or chained further with other iterator adapters, an issue #92393 didn't have.

Lokathor

Lokathor commented on Sep 7, 2022

@Lokathor
Contributor

This is probably an example of a case where you should first use copied(), and then chunk it, and then you turn that array into a simd type for the work, before possibly turning it back at the end of your loop.

And the change into and out of simd form will have a cost, so you'll want to ensure you're doing enough work inside each loop pass or it might exceed the cost of the scalar only code.

the8472

the8472 commented on Sep 7, 2022

@the8472
Member

yeah that was just an example to illustrate that the default implementation is suboptimal and we'll need optimized implementations on most linear-memory sources and length-preserving adapters.

Lokathor

Lokathor commented on Sep 7, 2022

@Lokathor
Contributor

Mm, yeah.

Myself I would imagine using this most on iterators that aren't linear memory sources, and "the point" would be to have a clean and unified way to turn some crazy data source into simple SIMD chunks.

rossmacarthur

rossmacarthur commented on Nov 1, 2022

@rossmacarthur
ContributorAuthor

Might be good to align the naming of this function with Iterator::array_chunks, i.e. either

  • Iterator::chunks
  • Iterator::next_chunk

Or

  • Iterator::array_chunks
  • Iterator::next_array_chunk
mark-i-m

mark-i-m commented on Jan 24, 2023

@mark-i-m
Member

So I'm finding myself caring less and less about the actual naming. I've wanted this feature at least 3 times in the last year and had to write my own version to get around. Can we just pick something and get the feature stabilized?

ZhennanWu

ZhennanWu commented on Apr 23, 2023

@ZhennanWu

I believe the current implementation also suffers from this compiler optimization issue: #110734

Godbolt: https://godbolt.org/z/Te5Wo9eh7

the8472

the8472 commented on Apr 23, 2023

@the8472
Member

I'm not sure what that issue has to do with next_chunk specifically. It seems to be mostly about the codegen for it.collect::<Vec<_>>(), which I think is easily disturbed by anything mutating the iterator.

19 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-autovectorizationArea: Autovectorization, which can impact perf or code sizeC-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCT-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @betelgeuse@frewsxcv@Andlon@pitaj@the8472

        Issue actions

          Tracking Issue for `Iterator::next_chunk` · Issue #98326 · rust-lang/rust