/// Creates an iterator which uses a closure to determine if an element should be removed.
///
/// If the closure returns true, then the element is removed and yielded.
/// If the closure returns false, it will try again, and call the closure
/// on the next element, seeing if it passes the test.
///
/// Using this method is equivalent to the following code:
///
/// ```
/// # let mut some_predicate = |x: &mut i32| { *x == 2 };
/// # let mut vec = vec![1, 2, 3, 4, 5];
/// let mut i = 0;
/// while i != vec.len() {
/// if some_predicate(&mut vec[i]) {
/// let val = vec.remove(i);
/// // your code here
/// }
/// i += 1;
/// }
/// ```
///
/// But `drain_filter` is easier to use. `drain_filter` is also more efficient,
/// because it can backshift the elements of the array in bulk.
///
/// Note that `drain_filter` also lets you mutate ever element in the filter closure,
/// regardless of whether you choose to keep or remove it.
///
///
/// # Examples
///
/// Splitting an array into evens and odds, reusing the original allocation:
///
/// ```
/// let mut numbers = vec![1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 15];
///
/// let evens = numbers.drain_filter(|x| *x % 2 == 0).collect::<Vec<_>>();
/// let odds = numbers;
///
/// assert_eq!(evens, vec![2, 4, 6, 8, 14]);
/// assert_eq!(odds, vec![1, 3, 5, 9, 11, 13, 15]);
/// ```
fn drain_filter<F>(&mut self, filter: F) -> DrainFilter<T, F>
where F: FnMut(&mut T) -> bool,
{ ... }
I'm sure there's an issue for this somewhere, but I can't find it. Someone nerd sniped me into implementing it. PR incoming.
Maybe this doesn't need to include the kitchen sink, but it _could_ have a range parameter, so that it's like a superset of drain. Any drawbacks to that? I guess adding bounds checking for the range is a drawback, it's another thing that can panic. But drain_filter(.., f) can not.
Is there any chance this will stabilize in some form in the not to far future?
If the compiler is clever enough to eliminate the bounds checks
in the drain_filter(.., f)
case I would opt for doing this.
( And I'm pretty sure you can implement it in a way
which makes the compiler clever eneugh, in the worst
case you could have a "in function specialization",
basically something like if Type::of::<R>() == Type::of::<RangeFull>() { dont;do;type;checks; return }
)
I know this is bikeshedding to some extent, but what was the reasoning behind naming this drain_filter
rather than drain_where
? To me, the former implies that the whole Vec
will be drained, but that we also run a filter
over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()
?"). The former on the other hand indicates that we only drain elements where some condition holds.
No idea, but drain_where
sounds much better and is much more intuitive.
Is there still a chance to change it?
.remove_if
has been a prior suggestion too
I think drain_where
does explains it the best. Like drain it returns values, but it does not drain/remove all values but just such where a given condition is true.
remove_if
sounds a lot like a conditional version of remove
which just removes a single item by index if a condition is true e.g. letters.remove_if(3, |n| n < 10);
removes the letter at index 3 if it's < 10.
drain_filter
on the other hand is slightly ambiguous, does it drain
then filter
in a more optimized way (like filter_map) or does if drain so that a iterator is returned comparble to the iterator filter
would return,
and if so shouldn't it be called filtered_drain
as the filter get logically used before...
There is no precedent for using _where
or _if
anywhere in the standard library.
The "said equivalent" code in the comment is not correct... you have to minus one from i at the "your code here" site, or bad things happens.
IMO it's not filter
that's the issue. Having just searched for this (and being a newbie), drain
seems to be fairly non-standard compared to other languages.
Again, just from a newbie perspective, the things I would search for if trying to find something to do what this issue proposes would be delete
(as in delete_if
), remove
, filter
or reject
.
I actually _searched_ for filter
, saw drain_filter
and _kept searching_ without reading because drain
didn't seem to represent the simple thing that I wanted to do.
It seems like a simple function named filter
or reject
would be much more intuitive.
On a separate note, I don't feel as though this should mutate the vector it's called on. It prevents chaining. In an ideal scenario one would want to be able to do something like:
vec![
"",
"something",
a_variable,
function_call(),
"etc",
]
.reject(|i| { i.is_empty() })
.join("/")
With the current implementation, what it would be joining on would be the rejected values.
I'd like to see both an accept
and a reject
. Neither of which mutate the original value.
You can already do the chaining thing with filter
alone. The entire point of drain_filter
is to mutate the vector.
@rpjohnst so I searched here, am I missing filter
somewhere?
Yes, it's a member of Iterator
, not Vec
.
Drain is novel terminology because it represented a fourth kind of ownership in Rust that only applies to containers, while also generally being a meaningless distinction in almost any other language (in the absence of move semantics, there is no need to combine iteration and removal into a single ""atomic"" operation).
Although drain_filter moves the drain terminology into a space that other languages would care about (since avoiding backshifts is relevant in all languages).
I came across drain_filter
in docs as a google result for rust consume vec
. I know that due to immutability by default in rust, filter doesn't consume the data, just couldn't recall how to approach it so I could manage memory better.
drain_where
is nice, but as long as the user is aware of what drain
and filter
do, I think it's clear that the method drains the data based on a predicate filter.
I still feel as though drain_filter
implies that it drains (i.e., empties) and then filters. drain_where
on the other hand sounds like it drains the elements where the given condition holds (which is what the proposed function does).
Shouldn't linked_list::DrainFilter
implement Drop
as well, to remove any remaining elements that match the predicate?
Yes
Why exactly does dropping the iterator cause it to run through to the end? I think that's surprising behaviour for an iterator and it could also be, if desired, done explicitly. The inverse of taking only as many elements out as you need is impossible on the other hand because mem::forget
ing the iterator runs into leak amplification.
I've been using this function a lot and I always have to remember to return true
for the entries I want to drain, which feels counter-intuitive compared to retain()
/retain_mut()
.
On an intuitive logical level, it would make more sense to return true
for entries I want to keep, does anyone else feel this way? (Especially considering that retain()
already works this way)
Why not do that, and rename drain_filter()
to retain_iter()
or retain_drain()
(or drain_retain()
)?
Then it would also mirror retain()
more closely!
This is why I proposed to rename it to
drain_where
as it then is clear that:
It is a form of drain so we use drain in the name
By using where
in combination with drain
it is clear that the
elements _where_ the predicate is true get drained, i.e removed and returned
It is more in sync with other names in std, normally if you have a
function consisting of two predicates you can emulate it (roughly) by using
functions representing each of the predicates in a "and then" fashion, e.g.
filter_map
can be emulated (roughly) as filter an then map
. The current
name indicates it is drain and then filter
, but it isn't even close to it
as it does not do a complete drain at all.
On Sun, Feb 25, 2018, 17:04 Boscop notifications@github.com wrote:
I've been using this function a lot and I always have to remember to
return true for the entries I want to drain, which feels
counter-intuitive compared to retain_mut().
On a primal level, it would make more sense to return true for entries I
want to keep, does anyone else feel this way?
Why not do that, and rename drain_filter() to retain_filter()?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-368320990,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kfwaNvz8YBwCE4BxDkeHgGxLvcWxks5tYYRxgaJpZM4OY1me
.
But with drain_where()
the closure would still have to return true for elements that should be removed, which is the opposite of retain()
which makes it inconsistent..
Maybe retain_where
?
But I think you're right that it makes sense to have "drain" in the name, so I think drain_retain()
makes the most sense: It's like drain()
but retaining the elements where the closure returns true
.
While it does not change, that you have to return true it makes clear that
you have to do so.
I personally would use either:
a. drain_where
b. retain_where
c. retain
I would not use drain_retain
.
Drain and retain speak about the same kind of process but from opposite
perspectives, drain speaks about what you remove (and r return) retain
speaks about what you keep (in the way it is already used in std).
Currently retaim
is already implemented in std with the major difference
that it is discarding elements while drain
is returning them, which makes
retain
(sadly) unsuited to be use in the name (except if you want to call
it retain_and_return
or similar).
Another point which speaks for drain is ease of migration, i.e. migrating
to drain_where
is as easy as running a word based search and replace on
the code, while changing it to retain would need a additional negation of
all used predicates/filter functions.
On Sun, Feb 25, 2018, 18:01 Boscop notifications@github.com wrote:
But with drain_where() the closure would still have to return true for
elements that should be removed, which is the opposite of retain() which
makes it inconsistent..
Maybe retain_where?
But I think you're right that it makes sense to have "drain" in the name,
so I think drain_retain() makes the most sense: It's like drain() but
retaining the elements where the closure returns true.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-368325374,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kfG4oZHxGfpOSK8DjXW3_2O1Eo3Rks5tYZHxgaJpZM4OY1me
.
But how often would you migrate from drain()
to drain_filter()
?
In all cases until now, I migrated from retain()
to drain_filter()
because there is no retain_mut()
in std and I need to mutate the element! So then I had to invert the closure return value..
I think drain_retain()
makes sense because the drain()
method drains unconditionally all elements in the range, whereas drain_retain()
retains the elements where the closure returns true
, it combines the effects of the drain()
and retain()
methods.
Sorry I mean to migrate from drain_filter
to drain_where
.
That you have a solution using retain
and then need to use
drain_filter
is a aspects I have not yet considered.
On Sun, Feb 25, 2018, 19:12 Boscop notifications@github.com wrote:
But why would you migrate from drain() to drain_filter()?
In all cases until now, I migrated from retain() to drain_filter()
because there is no retain_mut() in std and I need to mutate the element!
I think drain_retain() makes sense because the drain() method drains
unconditionally all elements in the range, whereas drain_retain() retains
the elements where the closure returns true.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-368330896,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kSayIk_fbp5M0RsZW5pYs3hDICQIks5tYaJ0gaJpZM4OY1me
.
Ah yes, but I think the "price" of inverting the closures in current code that uses drain_filter()
is worth it, to get a consistent and intuitive API in std and then in stable.
It's only a small fixed cost (and eased by the fact that it would go along with a renaming of the function, so the compiler error could tell the user that the closure has to be inverted, so it wouldn't silently introduce a bug), compared to the cost of standardizing drain_filter()
and then people always having to invert the closure when migrating from retain()
to drain_filter()
.. (on top of the mental cost of remembering to do that, and the costs of making it harder to find it in the docs, coming from retain()
and searching for "something like retain()
but passing &mut
to its closure, which is why I think it makes sense that the new name of this function has "retain" in the name, so that people find it when searching in the docs).
Some anecdotal data: In my code, I always only needed the retain_mut()
aspect of drain_filter()
(often they were using retain()
before), I never had use cases where I needed to process the drained values. I think this will also the most common use case for others in the future (since retain()
doesn't pass &mut
to its closure so that drain_filter()
has to cover that use case, too, and it's a more common use case than needing to process the drained values).
The reason why I'm agains drain_retain
is because of the way names are currently used in std wrt. collections:
drain
, collect
, fold
, all
, take
, ...*_where
, *_while
map
, filter
, skip
, ...)map
vs. filter
/skip
)filter_map
apply modifier_1 and then apply modifier_2
, just that it's faster or more flexible doing this in one stepYou sometimes might have:
drain_filter
)drain_where
)You normally do not have:
take_collect
as it is easily confusingdrain_retain
does kinda make sense but falls in the last category, while you probably can guess what it does it basically says remove and return all elements "somehow specified" and then keep all elements "somehow specified" discarding other elements
.
One the other hand I don't know why there should not be a retain_mut
maybe opening a quick RFC introducing retain_mut
as a efficient way to combine modify
+ retain
I have a hunch it might be faster
stabilized then this function. Until then you could consider writing a extension trait providing
you own retain_mut
using iter_mut
+ a bool-array (or bitarray, or...) to keep track of which elements
have to be reomved. Or providing your own drain_retain
which internally uses drain_filer
/drain_where
but wraps the predicate into a not |ele| !predicate(ele)
.
@dathinab
*_where
?retain()
, drain()
. There is no confusion with Iterator methods which transform one iterator into another iterator.retain_mut()
would not be added to std because drain_filter()
will already be added and people were advised to use that. Which brings us back to the use case of migrating from retain()
to drain_filter()
being very common, so it should have a similar name and API (closure returning true
means keep the entry)..I haven't been aware that retain_mut
was already discussed.
We are talking about general naming schemes in std mainly wrt. to
collections, which does include Iterators as they are one of the main
access methods for collections in rust, especially when it's about
modifying more than one entry.
_where
is just a example for suffixes to express a slightly modified_until
, _while
, _then
, _else
, _mut
and _back
.The reason why drain_retain
is confusing is that it's not clear if it is
drain or retain based, if it is drain based, returning true would remove
the value, if is retain based it would keep it. Using _where
makes it at
last clear what is expected of the passed in function.
On Mon, Feb 26, 2018, 00:25 Boscop notifications@github.com wrote:
@dathinab https://github.com/dathinab
- We are talking about a method on collections here, not on Iterator.
map, filter, filter_map, skip, take_while etc are all methods on Iterator.
Btw, which methods do you mean that use *_where?
So we have to compare the naming scheme to methods that already exist
on collections, e.g. retain(), drain(). There is no confusion with
Iterator methods which transform the iterator into another iterator.- AFAIK the consensus was that retain_mut() would not be added to std
because drain_filter() will already be added and people were advised
to use that. Which brings us back to the use case of migrating from
retain() to drain_filter() being very common, so it should have a
similar name and API (closure returning true means keep the entry)..—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-368355110,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kfkRAZ5OtLFZ-SciAmjHDEXdgp-0ks5tYevegaJpZM4OY1me
.
I've been using this function a lot and I always have to remember to return true for the entries I want to drain, which feels counter-intuitive compared to retain()/retain_mut().
FWIW, I think retain
is the counter-intuitive name here. I usually find myself wanting to delete certain elements from a vector, and with retain
I always have to invert that logic.
But retain()
is already in stable, so we have to live with it.. And so my intuition got used to that..
@Boscop: and so is drain
which is the inverse of retain
but also returns the removed elements and the usage of suffixes like _until
,_while
for making functions available which are just a slightly modified version of a existing functionality.
I mean if I would describe drain it would be something like:
_remove and return all elements specified "in some way", keep all other elements_
where _"in some way"_ is _"by slicing"_ for all sliceable collection types and _"all"_ for the rest.
The description for the function discussed here is _the same_ except that
_"in some way"_ is _"where a given predicate returns true"_.
One the other hand the description I would give retain is:
_only retain (i.e. keep) elements where a given predicate returns true, discard the rest_
(yes, retain could have been used in a way where it does not discard the rest, sadly it wasn't)
I do think that it would have been really nice if retain
would have
passed &mut T
to the predicate and maybe returned the removed values.
Because I think retain
is a more intuitive name base.
But independent of this I also think that both drain_filter
/drain_retain
are suboptimal
as they do not make it clear if the predicate has to return true/false to keep/drain a entry.
(drain indicates true does remove it as it speaks about removing elements while filter
and retrain speaks about which elements to keep, at last in rust)
In the end it's not _that_ important which of the names is used, it would be just nice if it gets stabilized.
Doing a poll and/or letting someone from the language team decide might be the best way to move thinks forward?
I think something like drain_where
, drain_if
, or drain_when
, is much clearer than drain_filter
.
@tmccombs Out of those 3, I think drain_where
makes the most sense. (Because if
implies either do the whole thing (in this case draining) or not
and when
is temporal.)
Compared to drain_filter
the closure return value is the same with drain_where
(true
to remove an element) but that fact is made clearer/explicit by the name, so it eliminates the risk of accidentally interpreting the meaning of the closure return value wrongly.
I think it’s more than time to stabilize. Summary of this thread:
R: RangeArgument
 parameter be added?true
from the callback causes that item to be included in the iterator.)drain_where
.)@Gankro, what do you think?
The libs team discussed this and the consensus was to not stabilize more drain
-like methods at the moment. (The existing drain_filter
method can stay in Nightly as unstable.) https://github.com/rust-lang/rfcs/pull/2369 proposes adding another drain-like iterator that doesn’t do anything when dropped (as opposed to consuming the iterator to its end).
We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:
RangeArgument
a.k.a. RangeBounds
) v.s. the entire collection (though the latter could be achieved by passing ..
, a value of type RangeFull
).Possibilities might include "overloading" a method by making it generic, or a builder pattern.
One constraint is that the drain
method is stable. It can possibly be generalized, but only in backward-compatible ways.
We’d like to see experimentation to attempt to generalize into a smaller API surface various combinations of draining:
How and where does the team foresee this type of experimentation happening?
How: come up with and propose a concrete API design, possibly with a proof-of-concept implementation (which can be done out of tree through at least Vec::as_mut_ptr
and Vec::set_len
). Where doesn’t matter too much. Could be a new RFC or a thread in https://internals.rust-lang.org/c/libs, and link it from here.
I've been playing around with this for a bit. I'll open a thread on internals in the next days.
I think a general API that works like this makes sense:
v.drain(a..b).where(pred)
So it's a builder-style API: If .where(pred)
is not appended, it will drain the whole range unconditionally.
This covers the capabilities of the current .drain(a..b)
method as well as .drain_filter(pred)
.
If the name drain
can't be used because it's already in use, it should be a similar name like drain_iter
.
The where
method shouldn't be named *_filter
to avoid confusion with filtering the resulting iterator, especially when where
and filter
are used in combination like this:
v.drain(..).where(pred1).filter(pred2)
Here, it will use pred1
to decide what will be drained (and passed on in the iterator) and pred2
is used to filter the resulting iterator.
Any elements that pred1
returns true
for but pred2
returns false
for will still get drained from v
but won't get yielded by this combined iterator.
What do you think about this kind of builder-style API approach?
For a second I forgot that where
can't be used as function name because it's already a keyword :/
And drain
is already stabilized so the name can't be used either..
Then I think the second best overall option is to keep the current drain
and rename drain_filter
to drain_where
, to avoid the confusion with .drain(..).filter()
.
(As jonhoo said above: )
what was the reasoning behind naming this drain_filter rather than drain_where? To me, the former implies that the whole Vec will be drained, but that we also run a filter over the results (when I first saw it, I thought: "how is this not just .drain(..).filter()?"). The former on the other hand indicates that we only drain elements where some condition holds.
I've opened a thread on internals.
The TLDR is that I think that non-selfexhaustion is a bigger can of worms than expected in the general case and that we should stabilize drain_filter
sooner rather than later with a RangeBounds
parameter. Unless someone has a good idea for solving the issues outlined there.
Edit: I've uploaded my experimental code: drain experiments
There are also drain and clearing benches and some tests but don't expect clean code.
Totally missed out on this thread. I've had an old impl that I've fixed up a bit and copy pasted to reflect a few of the options described in this thread. The one nice thing about the impl that I think will be non-controversial is that it implements DoubleEndedIterator
. View it here.
@Emerentius but then we should at least rename drain_filter
to drain_where
, to indicate that the closure has to return true
to remove the element!
@Boscop Both imply the same 'polarity' of true
=> yield. I personally don't care whether it's called drain_filter
or drain_where
.
@Popog Can you summarize the differences and pros & cons? Ideally over at the internals thread. I think DoubleEndedIterator
functionality could be added backwards compatibly with zero or low overhead (but I haven't tested that).
How about drain_or_retain
? It's a grammatically meaningful action, and it signals that it does one or the other.
@askeksa But that doesn't make it clear whether returning true
from the closure means "drain" or "retain".
I think with a name like drain_where
, it's very clear that returning true
drains it, and it should be clear to everyone that the elements that aren't drained are retained.
It would be nice if there was some way to limit/stop/cancel/abort the drain. For example, if I wanted to drain the first N even numbers, it would be nice to be able to just do vec.drain_filter(|x| *x % 2 == 0).take(N).collect()
(or some variant of that).
As it's currently implemented, DrainFilter
's drop
method will always run the drain to completion; it can't be aborted (at least I haven't figured out any trick that would do that).
If you want that behaviour you should just close over some state that tracks how many you've seen and start returning false. Running to completion on drop is necessary to make adaptors behave reasonably.
I just noticed that the way drain_filter
is currently implemented is not unwind safe but
actually a safety hazard wrt. unwind + resume safety. Additionally it easily causes an abord, both
of which are behaviours a method in std really shouldn't have. And while writing this I noticed
that it's current implementation is unsafe
I know that Vec
is by default not unwind safe, but the behaviour of drain_filer
when the
predicate panics is well surprising because:
An example of this behaviour is here:
play.rust-lang.org
While the 2. point should be solvable I think the first point on itself should
lead to an reconsideration of the behaviour of DrainFilter
to run to completation
on drop, reasons for changing this include:
drain_filter
might panic under some circumstances (e.g. a lockDrainFilter
to completion you might getArguments for running to completion include:
drain_filter
is similar to ratain
which is a function, so people might be surprised when theyDrainFilter
instead of running it to completion#[unused_must_use]
.for_each(drop)
which ironicallyDrainFilter
does on dropdrain_filter
is often used for it's side effect only, so it's to verboseretain
&T
, drain_filter used &mut T
find
, all
, any
which I quite a good reason to keep the current behaviour.It might be just me or I missed some point but changing the Drop
behaviour and
adding #[unused_must_use]
seems to be preferable?
If .for_each(drop)
is to long we might instead consider to add an RFC for iterators meant for
there side effect adding a method like complete()
to the iterator (or well drain()
but this
is a complete different discussion)
others??
I can't find the original reasoning, but I remember there was also some problem with adapters working with a DrainFilter
that doesn't run to completion.
See also https://github.com/rust-lang/rust/issues/43244#issuecomment-394405057
Good point, e.g. find
would cause drain to drain just until it hit's the first
match, similar all
, any
do short circuit, which can be quite confusing
wrt. drain.
Hm, maybe I should change my opinion. Through this might be a general problem
with iterators having side-effects and maybe we should consider a general solution
(independent of this tracking issue) like a .allways_complete()
adapter.
I have personally not found any safety reason why drain needs to run to completion but as I've written here a couple posts above, the side-effects on next()
interact in a suboptimal way with adapters such as take_while
, peekable
and skip_while
.
This also brings up the same issues as my RFC on non-selfexhausting drain and its companion selfexhausting iter adapter RFC.
It's true that drain_filter
can easily cause aborts but can you show an example of where it violates safety?
Yup, I already did: play.rust-lang.org
Which is this:
#![feature(drain_filter)]
use std::panic::catch_unwind;
struct PrintOnDrop {
id: u8
}
impl Drop for PrintOnDrop {
fn drop(&mut self) {
println!("dropped: {}", self.id)
}
}
fn main() {
println!("-- start --");
let _ = catch_unwind(move || {
let mut a: Vec<_> = [0, 1, 4, 5, 6].iter()
.map(|&id| PrintOnDrop { id })
.collect::<Vec<_>>();
let drain = a.drain_filter(|dc| {
if dc.id == 4 { panic!("let's say a unwrap went wrong"); }
dc.id < 4
});
drain.for_each(::std::mem::drop);
});
println!("-- end --");
//output:
// -- start --
// dropped: 0 <-\
// dropped: 1 \_ this is a double drop
// dropped: 0 _ <-/
// dropped: 5 \------ here 4 got leaked (kind fine)
// dropped: 6
// -- end --
}
But that's an implementation internal think, which went wrong.
Basically the open question is how to handle the panic
of an predicate function:
Another question is if it's a good idea to run functions which can be seen as api user input on drop
in general, but then this is the only way not to make find
, any
, etc. behave confusing.
Maybe a consideration could be something like:
next
, unset it before returning from next
Maybe there is an better solution.
Through whichever it is it should be documented in rustdoc once implemented.
@dathinab
Yup, I already did
Leaking is undesirable but fine and may be hard to avoid here, but a double-drop is definitely not. Good catch! Would you like to report a separate issue about this safety problem?
Does drain_filter
do reallocations every time it removes an item from collection? Or it does reallocate only once and works like std::remove
and std::erase
(in pair) in C++? I'd prefer such behavior because of exactly one allocation: we simply put our elements to the end of collection and then removes shrink it to proper size.
Also, why there is no try_drain_filter
? Which returns Option
type, and None
value if we should stop? I have a very big collection and it is meaningless to continue for me when I have already got what I needed.
The last time I took a at the code it did something like: created a "gap"
when moving out elements and move a element which is not drained to the
begin of the gap when it finds one. With this each element which has to be
moved (either out or to a new place in the array) is only moved once.
Also like e.g. remove
it doesn't reallocate. The freed part just becomes
part of the unused capacity.
On Fri, Aug 10, 2018, 07:11 Victor Polevoy notifications@github.com wrote:
Does drain_filter do reallocations every time it removes an item from
collection? Or it does reallocation only once and works like std::remove
and std::erase (in pair) in C++? I'd prefer such behavior because of
exactly one allocation: we simply put our elements to the end of collection
and then removes shrink it to proper size.Also, why there is no try_drain_filter ? Which returns Option type, and
None value if we should stop? I have a very big collection and it is
meaningless to continue for me when I have already got what I needed.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-411977001,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kdOZm4bj6iR9Hj831Qh72d36BxQSks5uPRYNgaJpZM4OY1me
.
@rustonaut thanks. What is your opinion about try_drain_filter
? :)
P.S. Just looked at the code too, it looks as it works the way we wanted.
It advances element by element when iteration, so normally you could expect
it to stop iterating when being dropped, but it was deemed to be too
confusing so it actually drains to the end when dropped.
(Which drastically increases the likely hood of double panics and stuff
like that).
So it's unlikely that you will get a try version which behaves like you
expect.
For fairness drain early stopping when iterating really can be confusing in
some situations e.g. thing.drain_where(|x| x.is_malformed()).any(|x|
x.is_dangerus())
would not drain all malformed ones but just until one of
found which is also dangerous. (The current Impl. does drain all malformed
by continuing draining on drop).
On Fri, Aug 10, 2018, 10:52 Victor Polevoy notifications@github.com wrote:
@rustonaut https://github.com/rustonaut thanks. What is your opinion
about try_drain_filter? :)—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-412020490,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHR0kcEMHCayqvhI6D4LK4ITG2x5di-9ks5uPUnpgaJpZM4OY1me
.
I think this would be more versatile:
fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F> where F: FnMut(T) -> Option<T>
Hi, I was searching for the drain_filter
functionality for HashMap
but it doesn't exist, and was asked to open an issue when I found this one. Should it be in a separate issue?
Is anything currently blocking this from stabilization? Is it still unwind-unsafe as reported above?
This seems like a pretty small feature, and it has been in limbo for over a year.
I think this would be more versatile:
fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F> where F: FnMut(T) -> Option<T>
I don't think this is better than a composition of drain_filter
and map
.
Is it still unwind-unsafe as reported above?
There appears to be a tough choice between not draining all matching elements if the iteration stops early, or a potential panic during unwind if filtering and draining to the end is done at the drop of a DrainFilter
.
I think this feature is rife with problems either way and as such should not be stabilized.
Is there any particular problem with having it behave differently on unwind?
Possibilities:
The most comprehensible counter-argument I can think of is that drop
code which depends on the invariant usually provided by drain_filter
(that, at the end, the elements in the vec will be exactly those which failed the condition) may be arbitrarily distant from the code (most likely normal, safe code) which uses drain_filter
.
However, suppose there was such a case. This code will be buggy no matter how the user writes it. E.g. if they write an imperative loop that went in reverse and swap-removed elements, then if their condition can panic and their drop impl relies heavily on the filter condition being false, the code still has a bug. Having a function like drop_filter
whose documentation can call attention to this edge case seems like an improvement in comparison.
Also, thanks, I found this playground example posted earlier in the thread which demonstrates that the current implementation still double-drops elements. (so it definitely cannot be stabilized as is!)
It might be worth opening a separate issue for the soundness bug? That can then be marked as I-unsound.
As far as I know you can't Mark or as unsound as double panic _is sound_
just very inconvenient as it aborts. Also as far as I remember the
possibility of double panicking isn't a bug but the behavior implicitly but
knowingly chosen.
The options are basically:
The problems are:
Or with other words 1. leads to confusion in normal use cases, 2. can lead
to abort if the predicate can panic 3.,4. make it so that you can't really
use it in a drop method, but how do you now a function you use there
doesn't use it internally.
As a result of this option 3.,4. are no-go. The problems with option 2. are
more rare then the ones in 1. so 2. was chosen.
IMHO it would be better to have a drain+drain_filter API which do not run
to completion on drop + a general iterator combinator which does run to
completion on drop + a method which complete a iterator but just drops all
remaining items. The problem is drain is already stable, the iterator
combinator adds overhead as it needs to fuse the inner iterator and drain
might not be the most appropriate name.
On Mon, May 20, 2019, 09:28 Ralf Jung notifications@github.com wrote:
It might be worth opening a separate issue for the soundness bug? That can
then be marked as I-unsound.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244?email_source=notifications&email_token=AB2HJEL7FS6AA2A2KF5U2S3PWJHK7A5CNFSM4DTDLGPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVX5RWA#issuecomment-493869272,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB2HJEMHQFRTCH6RCQQ64DTPWJHK7ANCNFSM4DTDLGPA
.
Double-drops are unsound though.
Created #60977 for the soundness issue
Thanks, I feel stupid for reading double drop as double panic :man_facepalming: .
- => Unexpected difference between drop in and outside of a panic. Just
consider someone _using_ drain_filter in a drop function.
3.,4. make it so that you can't really
use it in a drop method, but how do you now a function you use there
doesn't use it internally.
This to me is still not a big deal.
If somebody uses drain_filter
in a Drop impl with a condition that can panic, the problem isn't that they chose to use drain_filter
; the problem is that their Drop impl contains code that can panic! Likewise it doesn't matter whether a method uses drain_filter
internally; what matters is whether calling the method may panic (i.e. has the caller upheld the contract of the method?).
Sorry, I responded too early. I think I see what you mean now. I'll mull this over a bit more.
Alright, so your argument is that code inside a drop
impl that uses drain_filter
may mysteriously break if it happens to run during unwind. (this isn't about drain_filter
panicking, but about other code panicking which causes drain_filter
to run):
impl Drop for Type {
fn drop(&mut self) {
self.vec.drain_filter(|x| x == 3);
// Do stuff that assumes the vector has no 3's
...
}
}
This drop impl would suddenly misbehave during unwind.
This is indeed a compelling argument against having DrainFilter naively detect if the current thread is panicking.
The drain_filter
terminology makes the most since to me. Given that we already have drain
to remove all items, selecting which items to remove would be a filter
. When paired together the naming feels very consistent.
The fix for the double-panic soundness issue leaves the remainder of the Vec
in-tact in the event of a panic in the predicate. The items are back-shifted to fill the gap, but are otherwise left alone (and dropped via Vec::drop
during unwind or otherwise handled by the user if the panic is caught).
Dropping a vec::DrainFilter
prematurely continues to behave just as if it were fully consumed (which is the same as vec::Drain
). If the predicate panics during vec::Drain::drop
, the remaining items are back-shifted normally, but no further items are dropped and the predicate is not called again. It essentially behaves the same regardless of whether a panic in the predicate happens during normal consumption or when the vec::DrainFilter
is dropped.
Assuming that the fix for soundness hole is correct, what else is holding back stabilization of this feature?
Can Vec::drain_filter
be stabilized independently of LinkedList::drain_filter
?
The problem with the drain_filter
terminology, is that with drain_filter
, there are really two outputs: the return value, and the original collection, and it isn't really clear which side the "filtered" items go on. I think that even filtered_drain
is a little more clear.
it isn't really clear which side the "filtered" items go on
Vec::drain
sets a precedent. You specify the range of the items that you want to _remove_. Vec::drain_filter
operates the same way. You identify the items that you want to _remove_.
and it isn't really clear which side the "filtered" items go on
I’m of the opinion that this is already true for Iterator::filter
, so I’ve resigned myself to having to look at the docs / write a test whenever I use that. I don’t mind the same for drain_filter
.
I wish we had picked Ruby’s select
and reject
terminology, but that ship has long since sailed.
Any progress on this? Is the name the only thing that's still in limbo?
Is there anything blocking this from being stabilized?
It looks like DrainFilter
's Drop
impl will leak items if any of their destructors panic the predicate panics. This is the root cause of https://github.com/rust-lang/rust/issues/52267. Are we sure we want to stabilize an API like that?
Nevermind, that was fixed by https://github.com/rust-lang/rust/pull/61224 apparently.
Gonna poke at this tracking issue a little too, would love to see this feature hit stable :D Are there any blockers?
cc @Dylan-DPC
Was a decision ever made in favor or against having drain_filter
take a RangeBounds
parameter, like drain
does? Passing ..
seems easy enough when you want to filter the entire vector, so I'd probably be in favor of adding it.
I think this would be more versatile:
fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F> where F: FnMut(T) -> Option<T>
Almost, the more general version would take an FnMut(T) -> Option<U>
, like Iterator::{filter_map, find_map, map_while}
do. I have no idea if it's worth to generalize filter_map
in this way, but it might be worth considering.
I arrived here because I was looking more more or less exactly the method @jethrogb suggested above:
fn drain_filter_map<F>(&mut self, f: F) -> DrainFilterMap<T, F>
where F: FnMut(T) -> Option<T>
The only difference from what I had in mind (which I was calling update
in my head) was that I hadn't thought to make it return a draining iterator, but that seems like a clear improvement, since it provides a single, reasonably straightforward interface that supports updating items in place, removing existing items, and delivering the removed items to the caller.
Almost, the more general version would take an
FnMut(T) -> Option<U>
, likeIterator::{filter_map, find_map, map_while}
do. I have no idea if it's worth to generalizefilter_map
in this way, but it might be worth considering.
The function has to return Option<T>
because the values it produces are stored in a Vec<T>
.
@johnw42 I'm not sure I follow, wouldn't the Some
value immediately be yielded by the iterator?
Actually, I guess the input value of that function still needs to be &T
or &mut T
instead of T
in case you don't want to drain it. Or maybe the function could be something like FnMut(T) -> Result<U, T>
. But I don't see why the item type couldn't be some other type.
@timvermeulen I think we're interpreting the proposal differently.
The way I interpreted it, the Some
value is stored back into the Vec
, and None
means the original value is yielded by the iterator. That allows the closure to either update the value in place, or move it out of the Vec
. In writing this, I realized my version doesn't really add anything because you can implement it in terms of drain_filter
:
fn drain_filter_map<F>(
&mut self,
mut f: F,
) -> DrainFilter<T, impl FnMut(&mut T) -> bool>
where
F: FnMut(&T) -> Option<T>,
{
self.drain_filter(move |value| match f(value) {
Some(new_value) => {
*value = new_value;
false
}
None => true,
})
}
Conversely, I was thinking your interpretation isn't very useful because it's equivalent to mapping the result of drain_filter
, but I tried to write it, and it's not, for the same reason filter_map
isn't equivalent to calling filter
followed by map
.
@johnw42 Ah, yeah, I thought you wanted None
to mean that the value should stay in the Vec
.
So it seems that FnMut(T) -> Result<U, T>
would be the most general, though it's probably not very ergonomic. FnMut(&mut T) -> Option<U>
isn't really an option actually because that wouldn't allow you to take ownership of the T
in the general case. I think FnMut(T) -> Result<U, T>
and FnMut(&mut T) -> bool
are the only options.
@timvermeulen I started to say something earlier about a "most general" solution, and my "most general" solution was different from yours, but I reached the same conclusion, which is that trying to make a function too general results in a something you wouldn't actually want to use.
Although maybe there's still some value in making a very general method that advanced users can build nicer abstractions on top of. As far as I can tell, the point of drain
and drain_filter
isn't that they're particularly ergonomic APIs--they're not--but that they support use cases that occur in practice, and which can't be written any other way without a lot of redundant moves (or using unsafe operations).
With drain
, you get the following nice properties:
Vec
need not support Copy
or Clone
.Vec
itself needs to be allocated or freed.Vec
are moved at most once.With drain_filter
, you gain the ability to remove an arbitrary set of items from the Vec
rather than just a contiguous range. A less obvious advantage is that even if a contiguous range of items is removed, drain_filter
can still offer a performance boost if finding the range to pass to drain
would involve making a separate pass over the Vec
to inspect its contents. Because the argument to the closure is a &mut T
, it's even possible to update the items left in the Vec
. Hooray!
Here are some other things you might want to do with an in-place operation like drain_filter
:
And here's my analysis of each:
Vec
.Vec
are moved at most once, and it might be necessary to re-allocate the buffer. If you need to do something like that, it's not necessarily more efficient than just building a whole new Vec
, and it could be worse.Vec
is organized in such a way that you can just skip over a large portion of the items without stopping to inspect them. I didn't include it in my sample code below, but it could be supported by changing the closure to return an additional usize
specifying how many of the following items to jump over before continuing.usize
above could be replaced with a choice of Keep(usize)
or Drop(usize)
(where Keep(0)
and Drop(0)
are semantically equivalent).I think we can support the essential use cases by having the closure return an enum with 4 cases:
fn super_drain(&mut self, f: F) -> SuperDrainIter<T>
where F: FnMut(&mut T) -> DrainAction<T>;
enum DrainAction<T> {
/// Leave the item in the Vec and don't return anything through
/// the iterator.
Keep,
/// Remove the item from the Vec and return it through the
/// iterator.
Remove,
/// Remove the item from the Vec, return it through the iterator,
/// and swap a new value into the location of the removed item.
Replace(T),
/// Leave the item in place, don't return any more items through
/// the iterator, and don't call the closure again.
Stop,
}
One last option I'd like to present is to get rid of the iterator entirely, pass items to the closure by value, and allow the caller to leave an item unchanged by replacing it with itself:
fn super_drain_by_value(&mut self, f: F)
where F: FnMut(T) -> DrainAction<T>;
enum DrainAction<T> {
/// Don't replace the item removed from the Vec.
Remove,
/// Replace the item removed from the Vec which a new item.
Replace(T),
Stop,
}
I like this approach best because it's simple and it supports all the same use cases. The potential downside is that even if most of the items are left in place, they still need to be moved into the closure's stack frame and then moved back when the closure returns. One would hope those moves could be reliably optimized away when the closure just returns its argument, but I'm not sure if that's something we should count on. If other people like it enough to include it, I think update
would be a good name for it, because if I'm not mistaken, it can be used to implement any single-pass in-place update of a Vec
's contents.
(BTW, I completely ignored linked lists above because I forgot about them entirely until I looked at the title of this issue. If we're talking about a linked list, it changes the analysis of points 4-6, so I think a different API would be appropriate for linked lists.)
@johnw42 you can already do 3. if you have a mutable reference, by using mem::replace
or mem::take
.
@johnw42 @jplatte
(3) only really makes sense if we allow the item type of the returned Iterator
to be different from the item type of the collection.
(3) is a special case, because you both return the element from the Iterator
, and put a new element back into the Vec
.
Bikeshedding: I would kinda reverse the function of Replace(T)
and replace it with PushOut(T)
, with the purpose to "submit" the inner value of PushOut
onto the iterator, while keeping the original (parameter) item in the Vec
.
Stop
should probably carry the ability to return an Error
type (or work a bit like try_fold
?).
I implemented my super_drain_by_value
function last night, and I learned several things.
The headline item should probably be that, at least w.r.t. Vec
, everything we're talking about here is in the "nice to have" category (as opposed to adding a fundamentally new capability), because Vec
already essentially provides direct read and write access to all of its fields through the existing API. In the stable version, there's a small caveat that you can't observe the pointer field of an empty Vec
, but the unstable into_raw_parts
method removes that restriction. What we're really talking about is expanding the set of operations that can be performed efficiently by safe code.
In terms of code generation, I found that in the easy cases (e.g. Vec<i32>
), redundant moves in and out of the Vec
are a non-issue, and calls that amount to simple things like a no-op or truncating the Vec
are transformed into code that's too simple to be improved upon (zero and three instructions, respectively). The bad news is that for harder cases, both my proposal and and the drain_filter
method do a lot of unnecessary copying, largely defeating the purpose of the methods. I tested this by looking at the assembly code generated for a Vec<[u8; 1024]>
, and in both cases, each iteration has two calls to memcpy
that aren't optimized away. Even a no-op call ends up copying the entire buffer twice!
In terms of ergonomics, my API, which looks pretty nice at first glance, isn't so nice in practice; returning an enum value from the closure becomes pretty verbose in all but the simplest cases, and the variant I proposed where the closure returns a pair of enum values is even uglier.
I also tried extending DrainAction::Stop
to carry an R
value that's returned from super_drain_by_value
as an Option<R>
, and that's even worse, because in the (presumably typical) case where the returned value isn't needed, the compiler can't infer R
and you have to explicitly annotate the type of a value you're not even using. For this reason, I don't think it's a good idea to support returning a value from the closure to the caller of super_drain_by_value
; it's roughly analogous to why a loop {}
construct can return a value, but any other kind of loop evaluates to ()
.
Regarding generality, I realized the there are actually two cases for premature termination: one where the rest of the Vec
is dropped, and another there it's left in place. If premature termination doesn't carry a value (as I think it shouldn't), it becomes semantically equivalent to returning Keep(n)
or Drop(n)
, where n
is the number of items not yet examined. I do, however, think premature termination should be treated as a separate case, because compared to using Keep
/Drop
, it's easier to use goes through a simpler code path.
To make the API a bit more friendly, I think a better option would be to make the closure return ()
and pass it a helper object (which I'll refer to here as an "updater") that can be used to inspect each element of the Vec
and control what happens to it. These methods can have familiar names like borrow
, borrow_mut
, and take
, with extra methods like keep_next(n)
or drop_remainder()
. Using this kind of API, the closure is much simpler in simple cases and no more complex in the complex cases. By having most of the updater's methods take self
by value, it's easy to prevent the caller from doing things like calling take
more than once, or giving conflicting instructions for what to do in subsequent iterations.
But we can still do better! I realized this morning that, as is so often the case, this problem is analogous to one that has been definitively solved in functional languages, and we can solve it with an analogous solution. I'm talking about "zipper" APIs, first described in this short paper with sample code in OCaml, and described here with Haskell code and links to other relevant papers. Zippers provide a very general way to traverse a data structure and update it "in place" using whatever operations that particular data structure supports. Another way to think of it is that a zipper is a sort of turbocharged iterator with extra methods for performing operations on a specific type of data structure.
In Haskell, you get "in place" semantics by making the zipper a monad; in Rust, you can do the same thing using lifetimes by having the zipper hold a mut
reference to the Vec
. A zipper for a Vec
is very similar to the updater I described above, except that instead of passing it repeatedly to a closure, the Vec
just provides a method to create a zipper on itself, and the zipper has exclusive access to the Vec
as long as it exists. The caller then becomes responsible to writing a loop to traverse the array, calling a method at each step to either remove the current item from the Vec
, or leave it in place. Early termination can be implemented by calling a method that consumes the zipper. Because the loop is under the caller's control, it becomes possible to do things like process more than one item in each iteration of the loop, or handle a fixed number of items without using a loop at all.
Here's a very contrived example showing some of the things a zipper can do:
/// Keep the first 100 items of `v`. In the next 100 items of `v`,
/// double the even values, unconditionally keep anything following an
/// even value, discard negative values, and move odd values into a
/// new Vec. Leave the rest of `v` unchanged. Return the odd values
/// that were removed, along with a boolean flag indicating whether
/// the loop terminated early.
fn silly(v: &mut Vec<i32>) -> (bool, Vec<i32>) {
let mut odds = Vec::new();
// Create a zipper, which get exclusive access to `v`.
let mut z = v.zipper();
// Skip over the first 100 items, leaving them unchanged.
z.keep_next(100);
let stopped_early = loop {
if let Some(item /* &mut i32 */) = z.current_mut() {
if *item < 0 {
// Discard the value and advance the zipper.
z.take();
} else if *item % 2 == 0 {
// Update the item in place.
*item *= 2;
// Leave the updated item in `v`. This has the
// side-effect of advancing `z` to the next item.
z.keep();
// If there's another value, keep it regardless of
// what it is.
if z.current().is_some() {
z.keep();
}
} else {
// Move an odd value out of `v`.
odds.push(z.take());
}
if z.position() >= 200 {
// This consumes `z`, so we must break out of the
// loop!
z.keep_rest();
break true;
}
} else {
// We've reached the end of `v`.
break false;
}
}
(stopped_early, odds)
// If the zipper wasn't already consumed by calling
// `z.keep_rest()`, the zipper is dropped here, which will shift
// the contents of `v` to fill in any gaps created by removing
// values.
}
For comparison, here's more or less the same function using drain_filter
, except it only pretends to stop early. It's about the same amount of code, but IMHO it's a lot harder to read because the meaning of the value returned by the closure isn't obvious, and it's using mutable boolean flags to carry information from one iteration to the next, where the zipper version achieves the same thing with control flow. Because removed items are always yielded by the iterator, we need a separate filter step to remove negative values from the output, which means we need to check for negative values in two places instead of one. It's also kind of ugly that it has to keep track of the position in v
; the implementation of drain_filter
has that information, but the caller has no access to it.
fn drain_filter_silly(v: &mut Vec<i32>) -> (bool, Vec<i32>) {
let mut position: usize = 0;
let mut keep_next = false;
let mut stopped_early = false;
let removed = v.drain_filter(|item| {
position += 1;
if position <= 100 {
false
} else if position > 200 {
stopped_early = true;
false
} else if keep_next {
keep_next = false;
false
} else if *item >= 0 && *item % 2 == 0 {
*item *= 2;
false
} else {
true
}
}).filter(|item| item >= 0).collect();
(stopped_early, removed)
}
@johnw42 Your earlier post reminded me of the scanmut
crate, specifically the Remover
struct, and the "zipper" concept you mentioned seems very similar! That does seem a lot more ergonomic than a method that takes a closure for when you want total control.
Either way, this is probably not very relevant to whether drain_filter
should be stabilized, since we can always swap out the internals later. drain_filter
itself will always be very useful because of how convenient it is. The only change I'd still like to see before stabilization is a RangeBounds
parameter.
@timvermeulen I think it makes sense to add a RangeBounds
parameter, but keep the current closure signature (F: FnMut(&mut T) -> bool
).
You can always post-process the drained elements with filter_map
or whatever you want.
(For me, it's very important that the closure allows mutating the element, because retain
doesn't allow it (it was stabilized before this mistake was discovered).)
Yeah, that seems to be the perfect balance between convenience and usefulness.
@timvermeulen Yeah, I was straying rather far off the main topic.
One thing I noticed that is relevant to the original topic is that it's kind of hard to remember what the return value of the closure means--is it saying whether to keep the item or to remove it? I think it would be helpful for the docs to point out that v.drain_filter(p)
is equivalent to v.iter().filter(p)
with side-effects.
With filter
, using a boolean value is still less than ideal for clarity, but it's a very well-known function, and IMHO it's at least somewhat intuitive that the predicate answers the question "should I keep this?" rather than "should I discard this?" With drain_filter
, the same logic applies if you think about it from the perspective of the iterator, but if you think about it from the perspective of the input Vec
, the question becomes "should I NOT keep this?"
As for the exact wording, I propose renaming the filter
parameter to predicate
(to match Iterator::filter
) and adding this sentence somewhere in the description:
To remember how the return value of
predicate
is used, it may be helpful to keep in mind thatdrain_filter
is identical toIterator::filter
with the additional side-effect of removing the selected items fromself
.
@johnw42 Yes, good point. I think a name like drain_where
would be much clearer.
If you are going to get into naming bikeshedding; please make sure you’ve read all the comments; even hidden ones. Many variants have been proposed already, e.g. https://github.com/rust-lang/rust/issues/43244#issuecomment-331559537
But… it has to be named draintain()
! No other name is as beautiful!
I'm quite interested in this issue, and I read the whole thread, so I might as well try to summarize what everyone said, in the hopes of helping this get stabilized. I've added some of my own comments along the way, but I've tried to keep them as neutral as possible.
Here is a non-opinionated summary of the names I saw proposed:
drain_filter
: The name used in the current implementation. Consistent with other names suchs as filter_map
. Has the advantage of being analogous to drain().filter()
, but with more side-effects.drain_where
: Has the benefit of indicating whether true
results in draining _out_ or filtering _in_, which can be hard to remember with other names. There is no precedent in std
for the _where
suffix, but there's plenty of precedents for similar suffixes.drain().where()
, since where
is already a keyword.drain_retain
: Consistent with retain
, but retain
and drain
have opposite interpretations of the boolean values returned by the closure, which may be confusing.filtered_drain
drain_if
drain_when
remove_if
It might be worth adding a range argument for consistency with drain
.
Two closure formats have been suggested, FnMut(&mut T) -> bool
and FnMut(T) -> Result<T, U>
. The latter is more flexible, but also clumsier.
Reversing the boolean condition (true
means "keep in the Vec
") to be consistent with retain
was discussed, but then it wouldn't be consistent with drain
(true
means "drain from the Vec
").
When the filter closure panics, the DrainFilter
iterator is dropped. The iterator then should finish draining the Vec
, but to do so it must call the filter closure again, risking a double panic. There are some solutions, but all of them are compromises:
Don't finish draining on drop. This is pretty counterintuitive when used with adaptors such as find
or all
. Besides, it renders the v.drain_filter(...);
idiom useless since iterators are lazy.
Always finish draining on drop. This risks double panics (which result in aborts), but makes behaviour consistent.
Only finish draining on drop if not currently unwinding. This fixes double panics entirely, but makes the behaviour of drain_filter
unpredictable: dropping DrainFilter
in a destructor might _sometimes_ not actually do its job.
Only finish draining on drop if the filter closure did not panic. This is the current compromise made by drain_filter
. A nice property of this approach is that panics in the filter closure "short-circuit", which arguably is quite intuitive.
Note that the current implementation is sound and never leaks as long as the DrainFilter
struct is dropped (although it may cause an abort). Previous implementations were not safe/leak-free though.
DrainIter
could either finish draining the source vector when it's dropped, or it could only drain when next
is called (lazy iteration).
Arguments in favor of drain-on-drop:
Consistent with the behaviour of drain
.
Interacts well with other adaptors such as all
, any
, find
, etc...
Enables the vec.drain_filter(...);
idiom.
The lazy functionality could be explicitly enabled through drain_lazy
-style methods or a lazy()
adapter on DrainIter
(and even on Drain
, since it's backwards-compatible to add methods).
Arguments in favor of lazy iteration:
Consistent with almost all other iterators.
The "drain-on-drop" functionality could be explicitly enabled through adapters on DrainIter
, or even through a general Iterator::exhausting
adapter (see RFC #2370).
I might have missed some stuff, but at least I hope it helps newcomers when skimming the thread.
@negamartin
Wouldn't the drain-on-drop
option require that the iterator returns a reference to the item instead of the owned value? I think that would make it impossible to use drain_filter
as a mechanism to remove and take ownership of items matching a specific condition (which was my original use-case).
I don't think so, since the behaviour of the current implementation is precisely drain-on-drop while yielding owned values. Either way, I don't see how drain-on-drop would require borrowing elements, so I think we have two different ideas on what drain-on-drop means.
Just to be clear, when I say drain-on-drop I only mean the behaviour when the iterator is not fully consumed: Should all items matching the closure be drained even if the iterator isn't fully consumed? Or only up to the element that was consumed, leaving the rest untouched?
In particular, it's the difference between:
let mut v = vec![1, 5, 3, 6, 4, 7];
v.drain_where(|e| *e > 4).find(|e| *e == 6);
// Drain-on-drop
assert_eq!(v, &[1, 3, 4]);
// Lazy
assert_eq!(v, &[1, 3, 4, 7]);
Just throwing out an idea, but another possible API could be something like:
fn drain_filter_into<F, D>(&mut self, filter: F, drain: D)
where F: FnMut(&mut T) -> bool,
D: Extend<T>
{ ... }
It's less flexible than the other options, but does avoid the problem of what to do when DrainFilter
is dropped.
It feels to me like this whole thing is looking to me less and less like retain_mut()
(retain()
with a mutable reference passed to the closure), which is what it was first and foremost intended to provide. Could we provide retain_mut()
for now in addition to working on the design of filtered drain thing? Or am I missing something?
@BartMassey
which is what it was first and foremost intended to provide.
I don't think that's the case. I specifically use drain_filter
to take ownership of items based on the filter criteria. Drain
and DrainFilter
yeild the item where as retain does not.
@negamartin
Just to be clear, when I say drain-on-drop I only mean the behaviour when the iterator is not fully consumed
Ah, Ok. That's my mistake. I misunderstood your definition. I had interpreted it as "nothing is removed from the vec until dropped", which doesn't really make any sense.
Arguments in favor of lazy iteration
I think it needs be consistent with drain
. The Iterator::exhausting
RFC was not accepted and it would be very odd if drain
and drain_filter
had seemingly opposite drain behaviors.
@negamartin
drain_filter: The name used in the current implementation. Consistent with other names suchs as filter_map. Has the advantage of being analogous to
drain().filter()
, but with more side-effects.
It is not analogous (that's why we need retain_mut
/drain_filter
):
drain().filter()
would drain even those elements for which the filter closure returns false
!
I just noticed a small line in a comment by the lib team in #RFC 2870:
Possibilities might include "overloading" a method by making it generic, or a builder pattern.
Is it backwards-compatible to make a method generic if it still accepts the previous concrete type? If so, I believe that would be the best way forward.
(The builder pattern is a bit unintuitive with iterators, since methods on iterators are usually adaptors, not behaviour-changers. Besides, there are no precedents, for example chunks
and chunks_exact
are two separate methods, not a chunks().exact()
combo.)
No, not with the current design as far as I know as type inference which
worked before could now fail due to type ambiguity. Genetics with default
types for functions would help but are super tricky to do right.
On Fri, Jun 12, 2020, 21:21 negamartin notifications@github.com wrote:
I just noticed a small line in a comment by the lib team in #RFC 2870
https://github.com/rust-lang/rfcs/pull/2369:Possibilities might include "overloading" a method by making it generic,
or a builder pattern.Is it backwards-compatible to make a method generic if it still accepts
the previous concrete type? If so, I believe that would be the best way
forward.(The builder pattern is a bit unintuitive with iterators, since methods on
iterators are usually adaptors, not behaviour-changers. Besides, there are
no precedents, for example chunks and chunks_exact are two separate
methods, not a chunks().exact() combo.)—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/43244#issuecomment-643444213,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AB2HJELPWXNXJMX2ZDA6F63RWJ53FANCNFSM4DTDLGPA
.
Is it backwards-compatible to make a method generic if it still accepts the previous concrete type? If so, I believe that would be the best way forward.
No, since it breaks type inference in some cases. E.g. foo.method(bar.into())
will work with a conrete type argument, but not with a generic argument.
I think drain_filter as it is implemented now is very useful. Could it be stabilized as is? If better abstractions are discovered in the future, nothing stops them from being introduced as well.
What process do I need to initiate to try to get retain_mut()
added, independent of whatever happens with drain_filter()
? It seems to me that the requirements have diverged, and that retain_mut()
would still be useful to have regardless of what happens with drain_filter()
.
@BartMassey for new unstable library APIs, I think making a PR with an implementation should be fine. There are instructions at https://rustc-dev-guide.rust-lang.org/implementing_new_features.html for implementing the feature and at https://rustc-dev-guide.rust-lang.org/getting-started.html#building-and-testing-stdcorealloctestproc_macroetc for testing your changes.
I've been struggling with API differences between HashMap
and BTreeMap
today and I just wanted to share a word of caution that I think it's important that various collections strive to maintain a coherent API whenever in makes sense, something which at this point isn't always the case.
For instance String, Vec, HashMap, HashSet, BinaryHeap and VecDeque have a retain
method, but LinkedList and BTreeMap do not. I find it especially odd since retain
seems like a more natural method for a LinkedList or a Map than for vectors where random deletion is a very expensive operation.
And when you dig a bit deeper it's even more perplexing: HashMap::retain
's closure is passed the value in a mutable reference, but the other collections get an immutable reference (and String gets a simple char
).
Now I see that new APIs like drain_filter
are being added that 1/ seem to overlap with retain
and 2/ are not stabilized for all collections at the same time:
HashMap::drain_filter
is in the upstream repo but not yet shipped with Rust's std AFAIK (it doesn't appear in the docs)BTreeMap::drain_filter
, Vec::drain_filter
, LinkedList::drain_filter
are in Rust's std, but feature gatedVecDeque::drain_filter
doesn't appear to exist at all, it doesn't appear in the docsString::drain_filter
doesn't exist eitherI don't have a strong opinion on the best way to implement these features, or if we need drain_filter
, retain
or both, but I very strongly believe that these APIs should remain consistent across collections.
And perhaps more importantly, similar methods of different collections should have the same semantics. Something that the current implementations of retain
violate IMO.
Most helpful comment
Is there anything blocking this from being stabilized?