Hey gang.
As originally determined in https://twitter.com/kot_2010/status/927119253324619776, building a pre-size-hinted mutable Vec
as the output of a loop (even for very small n
) can occasionally be more than twice as fast as collect
ing into a Vec
. As demonstrated here https://github.com/stuhood/rust-vec-bench, using extend
into a pre-size-hinted Vec
closes most of the gap.
Focusing on improving Iterator::size_hint
implementations so that they suggest better sizes for collect
would likely help reduce this gap. But there are diminishing returns to heuristics, and in many cases users should be able to give better explicit hints.
This issue initially proposed adding an Iterator::collect_with_size_hint
method, but @scottmcm 's suggestion to add Iterator::collect_into<E: Extend>(e: E) -> E
seems more general, and suggests a useful analogue in unzip_into<E1: Extend, E2: Extend>(e1: E1, e2: E2) -> (E1, E2)
.
The problem is that Filter
et al can't return a "better" size hint because the trait contract is that the lower bound must actually be a lower bound. Maybe that should be relaxed?
Another possibility is collect
could look at the upper bound and be smarter about what capacity to use? This wouldn't always be advantageous because you have no idea how many elements will be filtered out.
You can also do this -- but note that OverrideSizeHint
is explicitly violating the trait contract:
type SizeHint = (usize, Option<usize>);
struct OverrideSizeHint<I>(I, SizeHint);
extension_trait! {
<I: Iterator> SizeHintExt<I> for I {
fn override_size_hint(self, hint: SizeHint) -> OverrideSizeHint<I> {
OverrideSizeHint(self, hint)
}
}
}
impl<I: Iterator> Iterator for OverrideSizeHint<I> {
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
self.0.next()
}
fn size_hint(&self) -> SizeHint {
self.1
}
}
fn collect_hint(ratio: u8, chars: &[char]) -> Vec<char> {
chars.iter().enumerate().filter_map(|(i, c)| {
if filter(ratio, i) {
Some(*c)
} else {
None
}
}).override_size_hint((chars.len(), Some(chars.len()))).collect()
}
Hmm, what about .collect_into(Vec::with_capacity(5))
, or something? (Name TBD of course)
collect
is the method-chain-syntax for FromIterator
; this would be the equivalent for Extend
.
@scottmcm : I like that alternative as well.
I noticed the other day that unzip has a similar problem. It always just extends with Some(x)
, so the size_hint seen by the extend code is always exactly 1, and thus useless for pre-allocating.
Updated description/title to apply @scottmcm 's collect_into
approach instead.
Do any Iterator
impls need to override the default collect_into
or would the default always be optimal?
trait Iterator {
/* ... */
fn collect_into<E: Extend<Self::Item>>(self, e: E) -> E {
e.extend(self);
e
}
}
As I understand it, the following are always equivalent right?
let vec = whatever.into_iter().etc(...).collect_into(Vec::with_capacity(N));
let mut vec = Vec::with_capacity(N);
vec.extend(whatever.into_iter().etc(...));
Is the advantage of collect_into
that the method nesting works out nicer?
Is the advantage of collect_into that the method nesting works out nicer?
Yes... and avoiding the mutable local variable is nice as well. Finally, the symmetry with collect
is appealing.
I love the idea of collecting into an existing collection, but for another reason: ergonomics. I've proposed something similar on Rust user and internals forums.
I think, that the function (extend_into
?) should accept collections as BorrowMut
to allow passing them as references. Also it could return the passed collection to allow nice chaining.
We might also consider partition_into
and unzip_into
for symmetry. Iterator::partition
and unzip
already use Extend
, but they create new Default
collections to extend into.
Also, FWIW rayon already has collect_into
and unzip_into
with different semantics than those proposed here, but we could consider renaming in rayon before it reaches 1.0 -- rayon-rs/rayon#519.
I feel that extend_into
was received quite well. I want to make a next step and propose an official RFC for it. Here's what I want to issue.
@stuhood I've included your motivation, are you OK with that?
Most helpful comment
Hmm, what about
.collect_into(Vec::with_capacity(5))
, or something? (Name TBD of course)collect
is the method-chain-syntax forFromIterator
; this would be the equivalent forExtend
.