Rust: Add `Iterator::collect_into<E: Extend>(e: E)`?

Created on 7 Nov 2017  路  10Comments  路  Source: rust-lang/rust

Hey gang.

As originally determined in https://twitter.com/kot_2010/status/927119253324619776, building a pre-size-hinted mutable Vec as the output of a loop (even for very small n) can occasionally be more than twice as fast as collecting into a Vec. As demonstrated here https://github.com/stuhood/rust-vec-bench, using extend into a pre-size-hinted Vec closes most of the gap.

Focusing on improving Iterator::size_hint implementations so that they suggest better sizes for collect would likely help reduce this gap. But there are diminishing returns to heuristics, and in many cases users should be able to give better explicit hints.

This issue initially proposed adding an Iterator::collect_with_size_hint method, but @scottmcm 's suggestion to add Iterator::collect_into<E: Extend>(e: E) -> E seems more general, and suggests a useful analogue in unzip_into<E1: Extend, E2: Extend>(e1: E1, e2: E2) -> (E1, E2).

C-feature-request T-libs

Most helpful comment

Hmm, what about .collect_into(Vec::with_capacity(5)), or something? (Name TBD of course)

collect is the method-chain-syntax for FromIterator; this would be the equivalent for Extend.

All 10 comments

The problem is that Filter et al can't return a "better" size hint because the trait contract is that the lower bound must actually be a lower bound. Maybe that should be relaxed?

Another possibility is collect could look at the upper bound and be smarter about what capacity to use? This wouldn't always be advantageous because you have no idea how many elements will be filtered out.

You can also do this -- but note that OverrideSizeHint is explicitly violating the trait contract:

type SizeHint = (usize, Option<usize>);
struct OverrideSizeHint<I>(I, SizeHint);
extension_trait! {
    <I: Iterator> SizeHintExt<I> for I {
        fn override_size_hint(self, hint: SizeHint) -> OverrideSizeHint<I> {
            OverrideSizeHint(self, hint)
        }
    }
}
impl<I: Iterator> Iterator for OverrideSizeHint<I> {
    type Item = I::Item;

    fn next(&mut self) -> Option<I::Item> {
        self.0.next()
    }

    fn size_hint(&self) -> SizeHint {
        self.1
    }
}

fn collect_hint(ratio: u8, chars: &[char]) -> Vec<char> {
    chars.iter().enumerate().filter_map(|(i, c)| {
        if filter(ratio, i) {
            Some(*c)
        } else {
            None
        }
    }).override_size_hint((chars.len(), Some(chars.len()))).collect()
}

Hmm, what about .collect_into(Vec::with_capacity(5)), or something? (Name TBD of course)

collect is the method-chain-syntax for FromIterator; this would be the equivalent for Extend.

@scottmcm : I like that alternative as well.

I noticed the other day that unzip has a similar problem. It always just extends with Some(x), so the size_hint seen by the extend code is always exactly 1, and thus useless for pre-allocating.

Updated description/title to apply @scottmcm 's collect_into approach instead.

Do any Iterator impls need to override the default collect_into or would the default always be optimal?

trait Iterator {
    /* ... */

    fn collect_into<E: Extend<Self::Item>>(self, e: E) -> E {
        e.extend(self);
        e
    }
}

As I understand it, the following are always equivalent right?

let vec = whatever.into_iter().etc(...).collect_into(Vec::with_capacity(N));
let mut vec = Vec::with_capacity(N);
vec.extend(whatever.into_iter().etc(...));

Is the advantage of collect_into that the method nesting works out nicer?

Is the advantage of collect_into that the method nesting works out nicer?

Yes... and avoiding the mutable local variable is nice as well. Finally, the symmetry with collect is appealing.

I love the idea of collecting into an existing collection, but for another reason: ergonomics. I've proposed something similar on Rust user and internals forums.

I think, that the function (extend_into?) should accept collections as BorrowMut to allow passing them as references. Also it could return the passed collection to allow nice chaining.

We might also consider partition_into and unzip_into for symmetry. Iterator::partition and unzip already use Extend, but they create new Default collections to extend into.

Also, FWIW rayon already has collect_into and unzip_into with different semantics than those proposed here, but we could consider renaming in rayon before it reaches 1.0 -- rayon-rs/rayon#519.

I feel that extend_into was received quite well. I want to make a next step and propose an official RFC for it. Here's what I want to issue.

@stuhood I've included your motivation, are you OK with that?

Was this page helpful?
0 / 5 - 0 ratings