Rust: IntoIterator should be implemented for [T; N] not just references

Created on 23 May 2015  Â·  36Comments  Â·  Source: rust-lang/rust

Currently IntoIterator is implemented for arrays of &'a mut [T; N] and &'a [T; N] but not for [T; N] making it only possible to collect() arrays of references, not arrays of values:

// These are the same in spite of `iter()` changing to `into_iter()`.
let vec_ref1: Vec<&i32> = [1, 2].iter().collect();
let vec_ref2: Vec<&i32> = [1, 2].into_iter().collect();

// This is what `into_iter()` should do but it doesn't actually work.
let vec: Vec<i32> = [1, 2].into_iter().collect();

This is something users are likely to run headlong into (probably accidentally) when testing out examples from the iterator docs. So, IntoIterator should be implemented so that last option is possible. The API stabilization RFC: https://github.com/rust-lang/rfcs/pull/1105 states:

any breakage in a minor release ... it must always be possible to locally fix the problem through some kind of disambiguation _that could have been done in advance_

Meaning this would be allowed as a minor change in spite of [1, 2].into_iter() behavior changing because it could be changed beforehand to [1, 2].iter().

A-iterators C-feature-accepted T-libs

Most helpful comment

Just thought I'd give this a +1 and add that I come across this quite frequently. I even occasionally resort to once(a).chain(once(b)).chain(once(c)) during some calls to flat_map just so I can remain on the stack when the only other alternatives are vec![a, b, c] or having to import another crate like arrayvec.

This is tricky to implement for an arbitrary T as panics need to be handled where a moved-out element isn't dropped

At least 90% of my use cases for this are for arrays where the elements are Copy, which I think avoids this issue as a type cannot impl both Copy and Drop. Perhaps it could be useful to first provide an IntoIterator implementation that requires the Copy bound, which may be removed later down the track if we're able to come up with a solution for non-Copy types? Alternatively, perhaps a .copy_iter() that takes self where T: Copy might be useful until a more generalised .into_iter() can be implemented?

Edit: I just want to add that often, .iter().cloned() is not a possible solution when your array has a very short lifetime, i.e.

let (r, g, b) = rgb();
let rgbas: Vec<_> = alphas.into_iter()
    .flat_map(|a| [r, g, b, a].iter().cloned()) // This fails due to an insufficient lifetime
    // .flat_map(|a| [r, g, b, a]) // This would be sooo nice
    .collect();

All 36 comments

cc @bluss

This is tricky to implement for an arbitrary T as panics need to be handled where a moved-out element isn't dropped, and this also leads to a large number of iterators where the reference-based IntoIterator implementations just return a normal slice iterator.

For the record, crate arrayvec implements a by-value iterator using one type and a trait for a range of array sizes. I think it has solved all safety concerns, but I'll definitely look into what happens during panic again.

I take it this would/will be much easier once https://github.com/rust-lang/rfcs/issues/1038 is implemented someday. Then at a minimum, this would be fixable shortly after that issue is.

The lack of this shows up in the beginnings of the book, and it's really annoying. I've shown off arrays, I've shown off loops, I want to show for over arrays. But I haven't introduced method syntax yet, so for e in a.iter() is something new. But you won't often be calling iter() explicitly when you're using Vecs, but we haven't introduced Vec yet either.

Just thought I'd give this a +1 and add that I come across this quite frequently. I even occasionally resort to once(a).chain(once(b)).chain(once(c)) during some calls to flat_map just so I can remain on the stack when the only other alternatives are vec![a, b, c] or having to import another crate like arrayvec.

This is tricky to implement for an arbitrary T as panics need to be handled where a moved-out element isn't dropped

At least 90% of my use cases for this are for arrays where the elements are Copy, which I think avoids this issue as a type cannot impl both Copy and Drop. Perhaps it could be useful to first provide an IntoIterator implementation that requires the Copy bound, which may be removed later down the track if we're able to come up with a solution for non-Copy types? Alternatively, perhaps a .copy_iter() that takes self where T: Copy might be useful until a more generalised .into_iter() can be implemented?

Edit: I just want to add that often, .iter().cloned() is not a possible solution when your array has a very short lifetime, i.e.

let (r, g, b) = rgb();
let rgbas: Vec<_> = alphas.into_iter()
    .flat_map(|a| [r, g, b, a].iter().cloned()) // This fails due to an insufficient lifetime
    // .flat_map(|a| [r, g, b, a]) // This would be sooo nice
    .collect();

+1, I also ran against this problem yesterday with this snippet (the Vec variant compiles, array does not). This is very surprising, especially for a beginner.

As discussed in #32871, we either need:

  1. impl Trait to be stable, so we could secretly use the impl IntoIterator for [T; 0] to [T; 32] trick (probably also need rust-lang/rfcs#2071 to name the associated type), or
  2. Get rust-lang/rfcs#2000 merged + implemented first (assuming stable code can use it without stabilizing const-generic, otherwise we need to count in stabilization time as well)

@kennytm impl Trait is now stable, so can this be implemented now?

@orlp I think you could give it a try. @rust-lang/libs WDYT?

(We'll also need the corresponding clippy lint rust-lang-nursery/rust-clippy#1565 to reduce the damage of inference breakage after stabilization.)

@kennytm Actually, IntoIterator needs member type IntoIter: Iterator, so we need existential types first, as you said :(

@kennytm I did come up with a way to do IntoIter<T> instead of IntoIter<[T; N]>: https://gist.github.com/orlp/2c0a704c3a30f6203fcaa33aca941f1c.

But I don't really think that helps as a temporary solution, because in the end we still want IntoIter<T, N>.

@orlp I changed your implementation slightly to avoid extra VarArray enum: https://gist.github.com/RReverser/bb9fd5250dba7e126f1df6dc6fb17662

Technically this is equivalent to your implementation, but avoids storing length twice both as an enum tag and as a separate field.

@RReverser I intentionally did not do that because it creates uninitialized memory, which is UB for T that have invalid bit patterns. Just creating uninitialized memory for those T is UB, even if you never access the uninitialized memory.

Here is a minimal example showing something that could happen: https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=23a761b2cda00df365dd11f128ee3409.

@orlp Hmm, these answers ("just creating uninitialized memory") don't seem to make much sense, but I haven't dug into ManuallyDrop changes they mention. Either way, you can replace uninitialized with zeroed which will avoid this issue.

@RReverser Did you check my minimal example? zeroed is just as bad.

@orlp That's a completely different issue and is related to layout optimisations, not zeroed or uninitialized behaviour.

@RReverser So you don't think it's undefined behavior that if I wrap your iterator type into an Option it could sometimes magically become None even when it's not?

No, these optimisations are not UB albeit not explicitly defined for flexibility (these are different things). Point is, they are intended to use free bit patterns for enum tags, which is exactly what happens in your example, and it's a safe assumption compiler makes there. But if you actually fill that buffer with your own valid values, they are still guaranteed to come back the way you put them.

@RReverser But you don't fill the buffer with valid values! The buffer remains partially uninitialized when len() < MAX_ARRAY_INTO_ITER_SIZE.

So when T is, say, &i32 you now have a buffer with potentially a bunch of null references in them returned to the user. They're not accessible to the user, but that doesn't matter as shown in my example above (just wrapping the returned value in an Option can blow things up).

According to https://doc.rust-lang.org/reference/behavior-considered-undefined.html, that is very much UB.

@RReverser Here I made your code fail with 100% safe code: https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=d6957a01e83bfaf273f58360cc92a847

I only changed uninitialized for zeroed. Mind you that if I didn't do this, the bug would still exist but only pop up sometimes, which is significantly worse.

Yeah, you're right, I also just realised that 0-size array would be still a problem (anything 1+ should work though because it's the first element that has to be properly filled to prevent these opts).

But to be reliably on the safe side, we'd need to replace the new ManuallyDrop (which seems to break many assumptions) with something like NoDrop instead.

Yeah, you're right, I also just realised that 0-size array would be still a problem (anything 1+ should work though because it's the first element that has to be properly filled to prevent these opts).

"Should work" isn't really the standard we should be going for. It only works until someone decides to change the way stuff is laid out in a future compiler version. What you're doing is explicitly mentioned as undefined behavior, and might break in the future.

This isn't just theoretical. The old versions of https://crates.io/crates/slotmap are yanked because I was unaware that simply creating invalid bit patterns (not accessing them) is defined as UB. It "worked" before, and then a compiler update came and all of a sudden my crate was unsafe.

It has nothing to do with ManuallyDrop. That's just the symptom. The disease is creating uninitialized memory for a type with invalid bit patterns.

That's why I made my version above, which is a bit hacky, but is perfectly safe.

Creating uninitialized memory on its own is not UB, no matter the underlying type, it's accessing it afterwards (including via wrapper types) that can be if not careful.

Sorry, I'd continue this discussion, but I don't really appreciate the tone it starts getting. Hopefully that's just a limitation of the written text, and if so, happy to discuss another time in person.

@RReverser No particular tone is intended. I'm just trying to discuss the matter at hand, nothing more.

Creating uninitialized memory on its own is not UB, no matter the underlying type, it's accessing it afterwards (including via wrapper types) that can be if not careful.

The Rust reference states that having invalid values is incorrect:

Rust code, including within unsafe blocks and unsafe functions is incorrect if it exhibits any of the behaviors in the following list. It is the programmer's responsibility when writing unsafe code that it is not possible to let safe code exhibit these behaviors.

  • Dereferencing a null or dangling raw pointer.
    ...
  • Invalid values in primitive types, even in private fields and locals:

    • Dangling or null references and boxes.

    • A value other than false (0) or true (1) in a bool.

    • A discriminant in an enum not included in the type definition.

    • A value in a char which is a surrogate or above char::MAX.

    • Non-UTF-8 byte sequences in a str.

Note how null references are mentioned twice, once for dereferencing them, and another time for simply having them. But it's not just null references, using uninitialized for any of the other mentioned types would also be UB.

The appropriate wrapper type for a partially-initialized array is std::mem::MaybeUninit.

As an experienced programmer in C++ but newcomer to Rust, I stumbled on this problem as well, for the same use case as mentioned before (processing pixel components).

It also comes up if you want to add some items to a Vec, eg.

let mut vec = ...;
if condition {
    vec.extend(["foo", "bar", "etc"]); // Error
}

And the easiest workaround (using vec![]) results in an unnecessary extra allocation.

@Diggsey vec.extend([...].iter().copied()) is fairly easy too and would avoid allocation, although it would be still nice to avoid doing that manually.

Implicitly changing into_iter() to iter() for [T;N] also makes below statement inconsistent:

There are three common methods which can create iterators from a collection:
iter(), which iterates over &T.
iter_mut(), which iterates over &mut T.
into_iter(), which iterates over T.

@rockmenjack We’re not changing it that way. We want to add <[T; N] as IntoIterator>::into_iter, which doesn’t exist today but would make things more consistent per the pattern that you quote. But since it doesn’t exist, code that does let x: [T; N] = …; x.into_iter(); can compile through auto-ref using <&[T; N] as IntoIterator>::into_iter which gives an iterator of &T. So we’d be changing the meaning of valid code and likely break it.

let x: [T; N] = …; x.into_iter(); can compile through auto-ref using <&[T; N] as IntoIterator>::into_iter which gives an iterator of &T. So we’d be changing the meaning of valid code and likely break it.

@SimonSapin this is what I meant inconsistent, call to [T;N]::into_iter() should be rejected by the compiler not auto-ref.
For a broader case, we shall block SomeTrait::some_fn(self) taking &T

Auto-ref in the general case is important and useful. Removing it would break lot of foo.bar() method calls where foo is owned and bar borrows &self, which would have to be rewritten as (&foo).bar(). It’s not only not allowed by the language’s stability promise, but I think it wouldn’t be desirable at all.

Auto-ref in the general case is important and useful. Removing it would break lot of foo.bar() method calls where foo is owned and bar borrows &self, which would have to be rewritten as (&foo).bar(). It’s not only not allowed by the language’s stability promise, but I think it wouldn’t be desirable at all.

I agree with the case you mentioned, but that is bar(&self).
The function signature of into_iter() is:

fn into_iter(self) -> Self::IntoIter

user of into_iter might expect into_iter() taking the ownership of self which is not true at the moment for all types. And yeah, self can be &[T;N], just some what counterintuitive.

It’s important to look at what the type of self is. In impl<'a, T> IntoIterator for &'a [T; 4] for example, the Self type is &'a [T; 4], not [T; 4]. The into_iter method does take ownership of its receiver, but that receiver itself is a reference that borrows the array.

Having that impl is important so that you can write a for loop like let x: [T; 4] = …; for i in &x { … } when you want to to iterate over &T references to the array’s items.

FYI: an by-value iterator for arrays was implemented in #62959. It is still unstable and missing the actual IntoIterator impl for arrays. That impl will (maybe) be added in #65819, but some discussion regarding backwards compatibility is necessary first. In case you want to chime in.

Was this page helpful?
0 / 5 - 0 ratings