Skip to content

Provide a means of turning iterators into fixed-size arrays #81615

Open
@bstrie

Description

@bstrie
Contributor

Now that min_const_generics is approaching there is a great deal of new support for working with fixed-sized arrays, such as std::array::IntoIter. But while converting from an array to an iterator is now well-supported, the reverse is lacking:

let mut a = std::array::IntoIter::new([1,2,3]);
let b: [i32; 3] = a.collect(); // the trait `FromIterator<{integer}>` is not implemented for `[i32; 3]`

It is roundaboutly possible to do the conversion by going through Vec:

let mut a = std::array::IntoIter::new([1,2,3]);
let b: [i32; 3] = a.collect::<Vec<_>>().try_into().unwrap();

But that isn't no_std compatible, and even with std there should be no need to allocate here.

The non-allocating way of converting the array presupposes a fair bit of familiarity with the stdlib, unsafe code, and unstable features:

#![feature(maybe_uninit_uninit_array)]
#![feature(maybe_uninit_array_assume_init)]
let mut array: [MaybeUninit<T>; N] = MaybeUninit::uninit_array();

for i in 0..N {
    array[i] = MaybeUninit::new(iter.next().unwrap());
}

let array = unsafe {
    MaybeUninit::array_assume_init(array)
};

...which suggests this is a prime candidate for stdlib inclusion.

The first problem is what the signature should be. There's no way to statically guarantee that an iterator is any given length, so (assuming that you don't want to panic) what is the return type when the iterator might be too short?

  1. -> [T; N]: straightforward, but you'd have to have T: Default, which limits its usefulness. Furthermore this uses in-band signaling to mask what is probably an error (passing in a too-small iterator), which feels user-hostile. This seems little better than panicking.
  2. -> [Option<T>; N]: the obvious solution to indicate potentially-missing data, but likely annoying to work with in the success case. And sadly the existing blanket impl that ordinarily allows you .collect from from a collection of options into an option of a collection can't be leveraged here because you still don't have an impl of FromIterator<T> for [T; N]. So if you actually want a [T; N] you're left with manually iterating over what was returned, which is to say, you're no better off than having the iterator you started out with!
  3. -> Option<[T; N]>: the simplest solution that doesn't totally ignore errors. This would be consistent with std::slice::array_windows for producing None when a function cannot construct a fixed-size array due to a too-small iterator. However, it unfortunately seems less quite a bit less recoverable in the failure case than the previous approach.
  4. -> Result<[T; N], U>: same as the previous, though it's possible you could pass something useful back in the error slot. However, as long as the iterator is by-value and unless it's limited to ExactSizeIterators, it might be tricky to pass the data back.
  5. -> ArrayVec<T, N>: this would be a new type designed to have infallible conversion from FromIterator. Actually extracting the fixed-size array would be done through APIs on this type, which avoids some problems in the next section.

IMO approaches 1 and 2 are non-starters.

The second problem is how to perform the conversion.

  1. FromIterator: the obvious approach, however, it cannot be used with return type 3 from the prior section. This is because of the aforementioned blanket impl for collecting a collection-of-options into an option-of-collections, which conflicts with any attempt to impl FromIterator<T> for Option<[T; N]>. I think even specialization can't solve this?
  2. TryFrom: theoretically you could forgo an impl of FromIterator<T> and instead impl TryFrom<I: IntoIterator<Item=T>>; hacky, but at least you're still using some standard conversion type. Sadly, Invalid collision with TryFrom implementation? #50133 makes it impossible to actually write this impl; people claim that specialization could theoretically address that, but I don't know enough about the current implementation to know if it's sufficient for this case.
  3. Introduce a new TryFromIterator trait. This would work, but this also implies a new TryIntoIterator trait and Iterator::try_collect. Likely the most principled approach.
  4. A new std::array::TryCollect trait, impl'd for I: IntoIterator. Less principled than the prior approach but less machinery for if you happen to think TryFromIterator and TryIntoIterator wouldn't be broadly useful.
  5. A new std::array::from_iter function. The simplest and least general approach. Less consistent with .collect than the previous approach, but broadly consistent with std::array::IntoIter (although that itself is considered a temporary stopgap until .into_iter works on arrays). Similarly, could be an inherent associated function impl'd on [T; N].
  6. Come up with some wacky const-generics shenanigans that would allow FromIterator<T> for [T; N] to Just Work, possibly via introducing a new const-generics aware version of ExactSizeIterator. If this could be done it would unquestionably be the most promising way to proceed, but I don't have the faintest idea where to begin.
  7. A new method Iterator::collect_array as a hardcoded alternative to collect. Similarly, an Iterator-specific equivalent of slice::array_chunks could fill the same role while also being potentially more flexible.

Any other suggestions?

Activity

added
T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.
on Feb 1, 2021
the8472

the8472 commented on Feb 1, 2021

@the8472
Member

Existing PRs #69985, #75644 and #79659. But It's great to have a place to consolidate discussion since there are so many ways to skin this cat.

Additional options.

  1. Limit to T: Default and fill the array when the iterator falls short. Or let the user provide a lambda to fill them.
  2. a new method on iterators (basically Add Iterator::collect_array method #79659 or variations thereof)
  3. wait for ArrayVec
bstrie

bstrie commented on Feb 1, 2021

@bstrie
ContributorAuthor

let the user provide a lambda to fill them.

This is an interesting alternative, and has precedence with Option::unwrap_or_default (which takes a value, not a lambda). However, it would preclude using any of the existing conversion traits.

Limit to T: Default and fill the array when the iterator falls short.

This alternative is mentioned above. The difficulty of handling the error case means that I wouldn't want this to be the only way to convert an iterator into an array, but I wouldn't be opposed to it existing alongside of another approach.

Thinking about it, I suppose this approach could technically be considered equivalent to the previously-mentioned "let the user provide a [value]" approach, assuming that the user does something like x.into_iter().chain(std::iter::repeat(FOO)).collect(); this would guarantee any would-be "empty" slots in the array would instead be filled in with FOO. A bit clunky, though.

wait for ArrayVec

Can you elaborate on what this is?

the8472

the8472 commented on Feb 1, 2021

@the8472
Member

Can you elaborate on what this is?

Something like

struct ArrayVec<T, N: const usize> {
  len: usize,
  data: [MaybeUninit<T>; N]
}

impl ArrayVec {

   /// may panic
   fn to_ary(self) -> [T; N]  {
      // ...
   }

}

This has been proposed in other conversations as it would help in several places in the standard library where an [T; N] is gradually initialized. And it would also help in this case since we could just return that instead of [T; N] directly. The user could then extract the array when len == N or handle the shorter than expected array in some other way.

It's similar to returning a tuple of a partially initialized array and a usize that tells you how many members have been initialized, except that it's safe to use.

With this we could do iter.collect::<ArrayVec<_, 16>>()

the8472

the8472 commented on Feb 1, 2021

@the8472
Member

#81382 (comment) suggests Iterator::array_windows() if that can be implemented (the storage requirements for the array were in question) then a similar Iterator::array_chunks() should be possible. With that one could call iter.array_chunks().next()

That is actually more general than using FromIterator or TryFrom because it does not consume the iterator and thus allows multiple arrays to be extracted if it has enough elements.

#79659 (comment) also suggests something similar, albeit as an iterator method instead of an iterator adapter.


Come up with some wacky const-generics shenanigans that would allow FromIterator for [T; N] to Just Work, possibly via introducing a new const-generics aware version of ExactSizeIterator. If this could be done it would unquestionably be the most promising way to proceed, but I don't have the faintest idea where to begin.

I'm not sure if that's possible. Iterators are stateful, they can be in a partially exhausted without changing their type. So the remaining number of elements can't be determined at compile time.

bstrie

bstrie commented on Feb 1, 2021

@bstrie
ContributorAuthor

Hm, I'm very intrigued by the suggestion of Iterator::array_chunks as a solution to this. It's usable in the simple case, generalizes to advanced cases, has symmetry with existing APIs, and could sidestep the return type discussion by making it natural to return a None when the iterator no longer has sufficient elements. However, would it actually be possible for this to support a remainder method like array_chunks has in order to handle the case where the iterator length isn't a multiple of the array length? What would the return type be? The hypothetical Iterator::array_windows avoids the problem of having a remainder, but for that you'd need T: Copy. I've added this to the list of suggestions.

I've also added ArrayVec to the list of potential return types.

amosonn

amosonn commented on Feb 1, 2021

@amosonn
Contributor

What about Iterator::next_n<N: const usize>? This allows to pull an array, and then keep pulling normally from the iterator. Iterator::array_chunks can be trivially implemented in terms of this.

As for the return type, it is worth noting: Whereas in an array, if some elements are "gone" (e.g. when a None is returned from array_windows), in an iterator this is more of a problem. Maybe something like Result<[T; N], (usize, [MaybeUninit<T>; N])>? (Or, of course, Result<[T; N], ArrayVec<T, N>>, though then it might be easier to just return the ArrayVec).

the8472

the8472 commented on Feb 1, 2021

@the8472
Member

@bstrie

What would the return type be?

Probably std::iter::ArrayChunks if it is an iterator adapter.

However, would it actually be possible for this to support a remainder method like array_chunks has in order to handle the case where the iterator length isn't a multiple of the array length?

Well, that's just the return type question again, but this time for the associated type, i.e. type Item = ???.

If Item = [T; N] then next() then there are two ways to implement this. Either next() just iterates until it can fill the array and
returns that and otherwise throws the partial result away... or it keeps internal state which can be retrieved through an additional method from the adapter. E.g. the number of elements it discarded. Or the elements it discarded inside a Vec<_>, which will never be allocated if your iterator is an exact multiple of N. Or an internal [MaybeUninit<T>; N] buffer to keep the remainder.

Or it could have any of the other discussed result types, e.g. type Item = ArrayVec

@amosonn

What about Iterator::next_n<N: const usize>? This allows to pull an array, and then keep pulling normally from the iterator. Iterator::array_chunks can be trivially implemented in terms of this.

They're actually a bit different, both having advantages and disadvantages.

If next_n() is implemented all the way down to the iterator source then it can be executed without discarding any elements. But that behavior would have to be optional, the default implementation would have to discard elements when there's not enough left to fill the array, at least for things that aren't TrustedLen/ExactSizeIterator/TrustedRandomAccess.

array_chunks() on the other hand could keep state in its adapter and thus make things recoverable. But it's potentially less efficient since it can't just pass entire arrays through the iterator pipeline for each step, and it would probably be larger.

So those should be considered as distinct options.

bstrie

bstrie commented on Feb 1, 2021

@bstrie
ContributorAuthor

I was mistaken about something in the original post; while it is impossible to impl FromIterator<T> for Result<[T; N], Foo> due to the conflicting impl, it's not impossible to impl FromIterator<T> for Result<[T; N], Foo<T>>. I can't say I understand why the latter doesn't conflict, but that does much lessen any need for TryFromIterator from a technical perspective.

amosonn

amosonn commented on Feb 2, 2021

@amosonn
Contributor

@the8472

array_chunks() on the other hand could keep state in its adapter and thus make things recoverable.

That's a good point! Getting the rest from the adapter definitely avoids the problem of the complicated signature. The downside is that you can forget to look at it, as opposed to a Result, and also that this pattern does not play well with a for loop.

33 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-arrayArea: `[T; N]`A-iteratorsArea: IteratorsC-feature-requestCategory: A feature request, i.e: not implemented / a PR.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bstrie@the8472@amosonn@usbalbin@Aplet123

        Issue actions

          Provide a means of turning iterators into fixed-size arrays · Issue #81615 · rust-lang/rust