Rust: Tracking issue for `step_by` stabilization

Created on 12 Aug 2015  Â·  107Comments  Â·  Source: rust-lang/rust

Update (@SimonSapin): this is now the tracking issue for an iterator adaptor based on Iterator::nth:

pub trait Iterator {
    fn step_by(self, step: usize) -> StepBy<Self>
}

The Step trait used for making ranges iterators is now tracked at https://github.com/rust-lang/rust/issues/42168.

Original issue:


The step_by method makes it possible to step through ranges with arbitrary-sized steps. There are several issues that need to be addressed before we can stabilize it:

  • The design/stabiliztion of the Step trait, which is currently a bit of a mess
  • The behavior on negative steps (see this thread)
  • Likely the design/stabilization of Zero/One, tracked here
B-unstable C-tracking-issue E-help-wanted E-mentor P-medium T-libs disposition-merge finished-final-comment-period

Most helpful comment

Please just stabilize this. I don't care if it's a mess. It's basic functionality.

All 107 comments

Nominating for P-High.

triage: P-high

We should probably start moving on this in the next cycle or drop the P-high.

Would definitely like some help getting this over the finish line. I'm happy to mentor!

The library subteam decided to not move this to FCP this cycle. The API is pretty likely to need revision in one form or another and that should be taken care of before a FCP happens. I believe discussion is still going on in the internals post though!

Also moving to P-medium

It seems odd to me that the by type is Self. It seems similar to the Add trait where I can choose the RHS and the sum type separately. It seems like the type of the steps should similarly be potentially different from the elements themselves (steps are effectively differences).

Where do we stand on this at the moment? This is a feature I'd very much like to use in stable, and am willing to provide help to get this finalized (with some mentoring).

Based on the documentation, I would expect (0u8..).step_by(1).count() to be 256, but currently it diverges. Is this a bug in the documentation or in the implementation?

Why do you expect that? Overflowing the range of an integer type is an error (it panics in debug mode), it’s not the expected end of a RangeFrom iterator (which doesn’t have an end).

@SimonSapin The example for RangeFrom::step_by is

for i in (0u8..).step_by(2) {
    println!("{}", i);
}

I think that this example is misleading: its explanation says "This prints all even u8 values.", but instead it overflows on panic in debug and loops forever in release.

Indeed, that example is wrong. How does this look? https://github.com/rust-lang/rust/pull/31172

Ditto @dschatzberg's comment. Imagine something like stepping through a range of two Instants by a Duration.

A sequence of floats like 0.1, 0.2, ..., 0.5 is so common in numerical codes that Matlab has a built-in operator for it:

But Step is only implemented for integer types, perhaps in the fear of a numerical error around the endpoint.

I understand your rigorous approach but it would be handy to have step_by() for floats (perhaps as an optional trait?)

Also, I think we should consider inclusion of Itertools::step(usize) into the stdlib as well (thanks u/levansfg on Reddit for letting me know)

which, to my eyes, is more generic (applicable to all iterators)

IMHO, step_by should be a general abstraction for all iterators as mentioned by @nodakai and the current implementation that uses addition must be at most a specialization for efficiency sake for integral types (now that Rust has specialization implemented).

yigal100's approach seems sensible, in which case step_by should take a usize regardless of the iterator type and just call next a configurable number of times (>= 0).

Another option would be to drop step_by and instead allow a Range to be multiplied by something; e.g. (1..3) * -2 would return -2, -4. In this case it would be nice to be able to write (0..10) * 0.1, however, where would the (int->float) conversions come in? Also, it might be nice to be able to use other operators: (0..101) / 100.0 + 1.0 for 101 values from 1 to 2.

Neither of these options requires a One trait really; it depends how Rust converts from the a..b syntax to an iterator (if this is restricted to integer types, as seems to be the case, then 1 as TYPE is sufficient).

The questions which I think need answering are:

1 What are the main use-cases?

In my mind these are:

a. producing a sequential list of array indices (i.e. non-negative integers), usually stepping by +1 but possibly also -1, 2, etc., and
b. producing an evenly spaced set of sample points, usually floating-point and with any spacing. This doesn't _need_ to be accommodated by the standard library.
c. (possibly) skipping some items in a general iterator (Itertools::step(usize) that nodaki mentioned)

2 What restrictions are there on syntax / implementation?

As far as I am aware, the a..b syntax must be kept for integers and must be usable as an iterator.

Is there a fundamental reason that -20..20 or 0.5..2.5 or "a".."z" cannot be implemented? Is there already a commitment to enable (0..8).step_by(2) in this form?

3 Flexibility, optimizability and intuitiveness

Which is clearer, (0..100).step_by(10) or (0..10) * 10? ` Or some other way?

Is there any point allowing negative steps? Maybe not for indices since .reverse() is clearer but possibly for a range of sample points (e.g. (0..11) / -100 to yield 0, -100, -200, ..., -1000).

One thing I like about the RANGE * SCALAR syntax is that it might enable type-conversions to floats or user-defined types without requiring support for those types in the a..b syntax.

Note that map allows some of the above already:

fn main() {
    let v: Vec<_> = (0..5).map(|x| x*2).collect();
    println!("list: {:?}", v);
    for v in (0..10).map(|x| x as f32 / 10.0) {
        println!("v: {}", v);
    }
}

Other Iterator methods like rev() can also be used. The inclusion of step from itertools should be enough to complete the picture here (i.e. drop the existing step_by method from Range).

Is it possible to optimise these down to the same thing?

    let v1: Vec<_> = (0..10).step_by(2).collect();
    let v2: Vec<_> = (0..5).map(|x| x*2).collect();

The use of the One trait by Range is another issue entirely: it's what allows a Range to be converted to an Iterator. I suppose the idea was that step_by would provide a different way of providing this "step".

It's ironic that Range depends on a trait called One to do this since 1f32 is just as much "one" as 1u32 is. It's also surprising that for x in (0f32..10f32).step_by(1f32) { ... } doesn't work given the API docs.

Is there any good reason not to drop step_by entirely?

Is it possible to make step_by work for floats (by using x < end comparison) without impairing the integer implementation?

Can the Iterator impl for Range use a new trait, e.g. core::ops::StepByOne, so the Zero and One traits are no longer required to meet this odd definition of "one"?

I'm kind of glad to see someone has brought up the One trait bit. I always did find it a bit unusual as the stated purpose of One (from its own documentation) is to be a multiplicative identity.

Recently I played around with Haskell, which on first impression appeared to have a really nice and clean solution by using a generic "successor" (add 1) method to provide ranges... however, upon further thought, such a pure and simple design does not enable the use of larger step sizes efficiently. Sure enough, it turns out that what Haskell actually requires from a range-compatible type is not the successor method, but rather _conversion methods to/from an integer_. In light of that, the One requirement doesn't seem too bad!


Is there a fundamental reason that -20..20 or 0.5..2.5 or "a".."z" cannot be implemented? Is there already a commitment to enable (0..8).step_by(2) in this form?

On this, I'd like to point out that non-integral candidates for a range type (such as characters or floating points) go hand-in-hand with inclusive-range syntax; The main benefits of semi-inclusive ranges come from looking at integers as "insertion indices" pointing between elements. Once your range elements are actual values, this notion begins to fail and you end up with let lowercase = vec!["a".."{"];

This is just speculation, but I rather suspect that support for expressions like ["a".."z"] were the primary motivating reason that some languages like Perl and Haskell use inclusive ranges despite having zero-based indexing -- design decisions which would appear to me to be fiercely at odds with each other!


Speaking for myself, the current form of step_by already suffices for any use case I've ever had, and I'm here today because I was surprised and disappointed to find that this feature still hasn't landed in stable.

Edit: Urk, okay. Iterator::step with a specialization for Range does sound alluring. Though I also fear it may have potential performance pitfalls unless we can "forward" these optimizations via e.g. a RandomAccessIterator trait

Next step is to clean up the Step trait into something reasonable for stabilization.

I continue to be happy to mentor!

Discussed at libs triage the decision was to _not_ move into final comment period, but as @brson mentioned the next step here is cleaning up the Step trait which @aturon is willing to help with!

I'm willing to help here. What tasks in particular need to be done with regard to cleaning up the Step trait?

@shssoichiro Thanks for stepping up (ahem)! So the basic problem is that, especially with the recent deprecation of Zero and One, the Step trait has gotten pretty out of hand. What we need is a trait that can support stepping forward and backward with arbitrary strides that is as simple/elegant as possible.

To get started, you can have a look at the code in https://github.com/rust-lang/rust/blob/master/src/libcore/iter/range.rs, most of which is devoted to the various step-by iterators. Challenge: figure out how to achieve the same goals, but with a simpler Step trait.

Separately, there are also some questions about what we should do with negative strides; see this thread for more. We'll need to come to consensus around a reasonable design there to go forward.

Hopefully that's enough to get started, but let me know if you've got questions! You can reach me on this thread or at [email protected]

@shssoichiro Had a chance to look at the Step trait yet?

@brson I have, although not as much as I'd like. I think the ideal path would be to remove all of the methods from the Step trait except for step and steps_between. However, I'm having trouble finding a practical way of implementing this.

Thanks for the update!

My humble opinion on this.

I think the difficulty arises from the fact that 2 different problems are being tackled at the same time:
1- mathematical ranges
2- generic iterator steps

What about extending the current Range structs (or implementing new ones) with a step parameter to solve the first problem, and going on with the Step trait for iterators?

This would allow to implement for i in (0.5 .. 15.7) * 0.2 (or any other syntax) using any (ordered) mathematical number type, and for something in some_vec_of_strings.step_by(2) with a normal usize for the iterators.

I made a quick example based on the @Alex-PK's suggestion here.

pub struct ExclusiveRange<T, S> {
    start: T,
    stop: T,
    step: S,
}

pub struct InclusiveRange<T, S> {
    start: T,
    stop: T,
    step: S,
}

pub struct UnboundedRange<T, S> {
    start: T,
    step: S,
}

It works well for integers, but right away I wrote a failing test for floats that I'm not sure how to get around.

#[test]
fn test_excl_float_range() {
   let mut iter: ExclusiveRange<f32, f32> = ExclusiveRange {
       start: 0.0,
       stop: 1.0,
       step: 0.3,
   };
   assert_eq!(iter.next(), Some(0.0));
   assert_eq!(iter.next(), Some(0.3));
   assert_eq!(iter.next(), Some(0.6));
   assert_eq!(iter.next(), Some(0.9));
   assert_eq!(iter.next(), None);
}

This fails with

thread 'test::test_excl_float_range' panicked at 'assertion failed: `(left == right)` (left: `Some(0.90000004)`, right: `Some(0.9)`)', src/lib.rs:122

How would we get around floating point addition errors? Also, this obviously doesn't address the issue of negative step sizes or going backwards like (10..1).

@jbaum98 That's not a problem with Range, it's a problem with floats. Because many operations can lose precision, you should almost never assert that two floats are exactly equal.

Ex: http://stackoverflow.com/questions/8929005/unittest-sometimes-fails-because-floating-point-imprecision

Unfortunately, it doesn't look like Rust's std has an assert_almost_eq!()

So it would be acceptable that the iterator produced 0.90000004 instead of 0.9?

It can't produce exactly 0.9 at all, ever - this value is not perfectly representable in floating point, nor is your 0.3 step.

Okay that makes sense. So it seems like the example works for the basic behavior of expanding ranges beyond integers, allowing inclusive and exclusive ranges, and allowing step_by as had been discussed so far.

Please just stabilize this. I don't care if it's a mess. It's basic functionality.

Does this adapter allows a range of f64 to be iterated in float steps?

I think that adopting something like itertools' step into the Iterator trait is best, because then you could do something like

my_vec.iter_mut().step_by(2).map(|x| x * x);

Basically keep it generic for everything that can be iterated.

On the topic of floating values, I think that the biggest problem is that floats don't have a well defined next value/successor. In that case, then step_by should only work for things that implement a trait that says they have a successor. I don't know how that would work, but it's an idea.

That way step_by won't be implemented for float-related stuff unless we define a successor function/trait that works for floats. And I think that it's not a huge problem if we keep floats out of it, because in the case that someone needs to use something that steps by a float, they'll probably be using some kind of numeric crate already, so the crate itself could implement that in the way that it works best for it.

I think step_by should just take a usize, like skip and take, and just call next on the wrapped element n times n each next called on the StepBy iterator (see my other comment from december)

Trying to solve both the mathematical and the stepping problem with the same solution is what keep this discussion running for nearly 2 years without a solution in sight and a basic functionality missing from the language.

@Alex-PK I believe that instead of the step_by method on Range that should be a method of iter (every_nth?), but I agree that it would be very useful and that it would cover most of the cases in which right now one would use step_by.
Note that an efficient implementation could use nth with the step instead of calling next.

Should we also keep the name step_by or change it to every_nth for consistency with nth?

The suggestion by @Alex-PK when combined with the optimization suggested by @ranma42 seems like it would nicely thread the needle of usable yet easy to specify and stabilize here. I'd personally like the idea of adding such an adapter and deprecating/deleting the current step_by.

Would anyone be interested in sending a PR with an implementation of this? The name every_nth seems reasonable to me and could live alongside step_by for now and hopefully get stabilized relatively quickly.

FWIW, itertools has the general method as simply step.

Oh neat! The implementation there looks perfect. I'm not 100% sure we'll be able to use the same name, though, as it'd clash with itertools and likely cause a lot of real breakage in the ecosystem.

Oh, I'm up for it. Just downloaded the rust repo, and I'll try working on it today.

I think I'm gonna make a EveryNth<I> struct for the iterator adaptor and a every_nth for the method.

Awesome thanks @ivandardi! Lemme know if you need any help.

Yeah, some design issues came up.

We have a few different options for this:

1. Where to start

1.1

let mut it = (0..).every_nth(1).take(3);
assert_eq!(it.next(), Some(0));
assert_eq!(it.next(), Some(2));
assert_eq!(it.next(), Some(4));
assert_eq!(it.next(), None);

1.2

let mut it = (0..).every_nth(1).take(3);
assert_eq!(it.next(), Some(1));
assert_eq!(it.next(), Some(3));
assert_eq!(it.next(), Some(5));
assert_eq!(it.next(), None);

2. Behaviour of every_nth(0)

2.1

let mut it = (0..).every_nth(0).take(3);
PANIC

2.2

let mut it = (0..).every_nth(0).take(3);
assert_eq!(it.next(), Some(0));
assert_eq!(it.next(), Some(1));
assert_eq!(it.next(), Some(2));
assert_eq!(it.next(), None);

2.3

let mut it = (0..).every_nth(0).take(3);
assert_eq!(it.next(), Some(0));
assert_eq!(it.next(), Some(0));
assert_eq!(it.next(), Some(0));
assert_eq!(it.next(), None);

Okay, @ivandardi and me are discussing this over at Discord currently and it seems like it's not even clear whether .every_nth(2) should skip 2 elements in between or if it should move the iterator by 2 elements ahead. The former assumption is made in 1.1 and 1.2 here, while the latter is made in 2.1, 2.2 and 2.3.

We're discussing whether the sequence 0, 2, 4 should be produced by (0..).every_nth(1).take(3) or (0..).every_nth(2).take(3). The first one has the semantics of "skip n", while the second is "go every nth element".

In case the first one is better, then it would be a good idea to rename every_nth to something that better expresses what's happening, while if the second one is better, then we need to decide what happens with every_nth(0).

I think if it's called every_nth it should be the same as if you called nth in a loop.

That would mean that for item in list.iter().every_nth(4) would iterate over every 5th item, which is extremely surprising behavior. Since this is supposed to be solving the step_by issue, which is something you would use in for loops that step by a certain value every time, you wouldn't expect "every 4th" element to yield every 5th element.

I think that's an argument for calling this something else, because the relationship to nth is messy.

(primarily because nth is 0-indexed, which doesn't match most people's intuition. e.g. nth(2) returns the third item.)

Let me see...

  1. step: We could call it just like the itertools crate method.

  2. step_by: We could use the same name of the method currently in the Step trait and just remove the Step trait altogether.

  3. skip_every: Longer name, but more descriptive.

  4. skip_by: Variation of the last one.

  5. ???

Below is what each of those names sounds to _me_ (reference iterator: [1, 2, 3, ...]:

  1. .every_nth(n), I would expect [n, 2n, ...]
  2. .skip_by(n), I would expect [1, n+2, 2n+3, ...] (skips n items in-between).
  3. .step_by(n), I would expect [1, n+1, 2n+1, ...] (takes n steps).
  4. .skip_every(n) sounds ambiguous to me. Is it only going to skip every nth item?
  5. .take_every(n) sounds ambiguous too. I _might_ expect it to be the same as every_nth.

_EDIT:_ Clarify

Ah ok yeah the confusion with the name every_nth makes sense to me. At this point I think I like the idea of keeping the name step_by perhaps? It's an unstable API so it's not entirely unreasonable that we'd break some usage of it.

Couldn't we completely remove the Step trait too? I don't think it's something that will be used anytime soon, and the sooner we remove/replace it, the better. Then we could use the step_by name and it should still work with any (0..10).step_by() codes out there.

I was thinking, if Iterator had another method named next_nth which simply skips to the "next nth" item, by calling next, n times by default; then types can simply have their own more efficient implementation when possible. This might make it easier to implement step_by? (Was this already considered?)

@ivandardi yeah if we leverage Iterator::nth I think ideally we'd remove the Step trait

@johncf are you thinking of Iterator::nth?

Since the Step trait already has the step method, I thought I'd look into its implementation.

For (0..10).step(0), according to the current implementation, it just enters an infinite loop yielding 0s. We'd like to avoid that.

For (0..10).step(1) it yields 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

For (0..10).step(2) it yields 0, 2, 4, 6, 8.

So the current implementation of Step::step_by behaves like Python's, except when the step is 0. Python throws an exception in that case. If we'd like to follow the current behavior and also keep it aligned with other languages, I think it should behave like what described above and panic when the step is 0. The current PR does exactly that.

So all that's left is to decide on the name. I've talked with some other people, and it seems they prefer step, every and step_by. The pros of using step_by is that if there's some code that uses it already, then it won't break. But the other two are perfectly valid too. every is meant to be used as .every(2), which is like every_nth(2) but shorter and easier to read and say. step is the name that the itertools crate chose.

Maybe we could do a poll on this and let the community decide what they like best? Rust weekly comes out tomorrow, so if we can get a poll going soon, then we might be able to put a link to it on tomorrow's weekly.

I'd probably avoid step because I think it'd break all itertools users (which would be bad!). Between every and step_by I'd lean towards step_by personally, but I don't have much of a strong opinion.

To replace bad looking and not-DRY code like this ("deg" is repeated three times and it's still present in the scope after the loop end):

let mut deg = -90.0;
while deg < 90.0 {
    deg += 1.5;
}

Is it possible and a good idea to use step_by() on an interval of floating point values too?

for deg in (-90.0 .. 90.0).step_by(1.5) {}

@leonardo-m The problem with that is the reason this issue is open. :)

See the implementation by @jbaum98 for a generator for your case.

I agree that some syntactic sugar would be good to have in the language for it, but it should not be step_by(), as it would lead to confusion.

I am not sure what that sugar could be, to be honest.
Maybe (n..m , s) but this could look like a tuple with a range and a number, or (n..m)*s (because "by" in english can mean both multiplication and step), or (n..m)/s (to mean the range but in pieces of s size).

Edit: a simple impl could be enough, but it must be done in the std::ops:

impl<T, S> Div<S> for Range<T>
    where S: Copy
{
    type Output = ExclusiveRange<T, S>;

    fn div(self, rhs: S) -> Self::Output {
        ExclusiveRange {
            start: self.start,
            stop: self.end,
            step: rhs,
        }
    }
}

for i in (0.5..12.7) / 0.23 {
    // do something with i
}

Syntax sugar is sometimes nice, but in this case a Prelude or standard library thing that allows to write this is enough for me:

for deg in FPInterval(-90.0, 90.0, 1.5) {} // f64
for deg in FPInterval(-90f32, 90.0, 1.5) {} // f32

Is it possible and a good idea to use step_by() on an interval of floating point values too?

There's also the possibility of accumulating floating point error through repeated addition, depending on the specific values chosen. Only you can tell if those errors are acceptable for your case. Were I writing the code, I'd probably do some precomputation and then iterate over integers, converting to floating point repeatedly:

for deg in (0..120).map(|d| -90.0 + (d as f64 * 1.5)) {
    println!("{}", deg);    
}

Sadly, it's not quite as readable, and I don't know if the multiplication will have any less possibility of having error than the repeated addition (although I hope it would).

@shepmaster couldn't that be how it's implemented under the hood for floats?

how it's implemented under the hood for floats?

I'm not certain, but I don't believe so. That first example nicely divides into integers. A theoretical -10.0 to 10.0 with increments of 3.14 would not. This ends up "losing" the last value in my naive algorithm:

fn fake_step(start: f64, end: f64, step: f64) {
    let range = end - start;
    let n = (range / step) as usize;

    for deg in (0..n).map(|n| start + (n as f64 * step)) {
        println!("{}", deg);
    }
}

fn main() {
    fake_step(-90.0, 90.0, 1.5);
    fake_step(-10.0, 10.0, 3.14);
}

Although perhaps a .ceil "solves" the problem; but then there's a lot of magic happening behind the scenes. Something like this could be put into a crate, of course, if you are willing to accept the downsides of the implementation.

Yeah, linspace is the way to go for that problem.

for i in linspace(-90f32, 90f32, 120) {
    // do stuff
}

But there's crates that offer this already. I don't know if it would be a good idea to bring it into Rust core, especially when there's so many different ways to implement it.

The idea here is to make step_by general for any Iterator, which means it can only accept usizes as the step, especially because a 1.5 step isn't defined for a vector iterator :P

I don't know if it would be a good idea to bring it into Rust core, especially when there's so many different ways to implement it.

Stepping on floating point interval isn't a very common operation in Rust. But I have had to do it in Rust. I think it's basic enough, and having multiple ways to do it is one of the reasons to have a standard common way to do it.

I agree that a function like fpinterval() is a topic different from the step_by.

Great to see this making progress! I like the "always take usize and figure out the math stuff later" approach đź‘Ť

Couldn't we completely remove the Step trait too?

The step trait is also used for non-StepBy iteration of ranges today, so it probably shouldn't be completely removed right away. I think these are the only trait methods that should be removed (by is the step amount, and is_negative made (10..1).step_by(-1) "work"):

    fn step(&self, by: &Self) -> Option<Self>;
    fn steps_between(start: &Self, end: &Self, by: &Self) -> Option<usize>;
    fn is_negative(&self) -> bool;

I think that will be a big help to eventually stabilizing letting people iterate their own types in a Range, though. (For example, dates have reasonable pred/succ operations, but .step_by(&Date::parse("2017-05-01").unwrap()) is nonsense, so yay for deleting it!)

Now there are two StepBy structs in the docs after #41439 has been merged.

step_by

Yeah, now we need to make another PR to remove that deprecated StepBy struct. I can work on that no problem. I just need to know if it's ok to completely remove the Step trait or if we should keep it around.

@photino It's already being corrected in #42110

Tracking for Step has moved to https://github.com/rust-lang/rust/issues/42168, so can be removed from the list in the OP.

In Servo, we rely on Range::step_by to have a single type for normal ranges and reversed ones. How do I do that now? I'm pretty sure we don't want to box iterators in the middle of nowhere in the layout crate.

Cc @pcwalton

@nox I think that if you do my_vec.iter().rev().step_by(2) it should work as intended and only have one single type StepBy.

@nox While Range::step_by certainly worked there, the fact that step_by.abs() == 1 makes me feel like it was never a perfect match for the semantics you need. And that it forced a manual range reversal with extra - 1s makes it seem awkward at best.

I think it'd be clearer with something more focused on "single type for normal ranges and reversed ones":

let fragment_indices = (range.begin().get()..range.end().get()).rev_if(reverse);

Proof-of-concept: https://play.rust-lang.org/?gist=0c7dfce47adf5fc5b2683e8561acd869

So, I still think step_by(0) = repeat makes a lot of sense. I even thought of a language example that allows making arbitrary ranges infinite: Haskell ([start, start, ..]), making it “consistent” and all that.

I'm against step_by(0) meaning repeat:

  • Supporting repeat would prohibit StepBy being ExactSizedIterator ever
  • An off-by-one error causing you to get something infinite instead of (non-strictly) shorter is scary
  • The repeat behaviour is kinda weird, since it ignores most of the iterator and is neither iter::repeat nor Iterator::cycle. I'd rather people choose repeat(it.next().unwrap()) or it.take(1).cycle() to say what they mean.

Basically, the "produces an infinite iterator" behaviour feels too qualitatively different to me from the "skip a subset of the items" behaviour for it to make sense in the same method & type.

Note that there's even a Clippy lint against using the existing .step_by(0): https://github.com/Manishearth/rust-clippy/wiki#range_step_by_zero

Hi there,

I do not understand the rational behind the decision of deprecating Range::step_by. Even though the new generic Iterator::step_by does provide an equivalent apparent functionality, it is actually a huge loss in performances.

Consider the following program:

for i in (1..1000000000).step_by(100000000) {
    println!("{}", i)
}

Even though there is almost nothing to do (i.e., printing 10 integers), the new implementation has to loop over all the integers between 1 and 1000000000, which takes a lot of time. The previous implementation actually did what I expected (i.e., incrementing by 100000000). Note that I am actually measuring this performance issue in release mode.

Now, you could tell me that I could have written something like:

for i in (0..).map(|x| x*100000000+1).take_while(|x| x < 1000000000) {
    println!("{}", i)
}

But we are still relying on the optimizer's ability to transform multiplications into successive additions. Are we sure that LLVM will always perform such an optimization?

Moreover this is much more verbose, and clearly less understandable than the corresponding code in C:

for(int i = 1; i < 1000000000; i += 100000000) { ... }

This is "just° a lack of specialisation. There's nothing preventing us from
adding just as optimal code path for ranges as before.

On Jun 7, 2017 13:57, "Jacques-Henri Jourdan" notifications@github.com
wrote:

Hi there,

I do not understand the rational behind the decision of deprecating
Range::step_by. Even though the new generic Iterator::step_by does
provide an equivalent apparent functionality, it is actually a huge loss in
performances.

Consider the following program:

for i in (1..1000000000).step_by(100000000) {
println!("{}", i)
}

Even though there is almost nothing to do (i.e., printing 10 integers),
the new implementation has to loop over all the integers between 1 and
1000000000, which takes a lot of time. The previous implementation actually
did what I expected (i.e., incrementing by 100000000). Note that I am
actually measuring this performance issue in release mode.

Now, you could tell me that I could have written something like:

for i in (0..).map(|x| x*100000000+1).take_while(|x| x < 1000000000) {
println!("{}", i)
}

But we are still relying on the optimizer's ability to transform
multiplications into successive additions. Are we sure that LLVM will
always perform such an optimization?

Moreover this is much more verbose, and clearly less understandable than
the corresponding code in C:

for(int i = 1; i < 1000000000; i += 100000000) { ... }

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/27741#issuecomment-306761043,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AApc0obsmZDr3uCFyvLwY5aoTmNYSmmFks5sBoIJgaJpZM4FqaVv
.

Oh, sorry, I did not know that specialization was possible in Rust (after all, that's an unstable feature!)

In any case, is specialization for step_by actually planned? It requires a few changes to the current interface, since the return type will no longer be always StepBy.

I agree that such specialization is not an optional part of this new step_by, it's an essential part.

I think it doesn't need real specialization, just add a custom nth in impl Iterator for Range.

I am trying to use the new step_by() of std::trait::iterator:
https://doc.servo.org/core/iter/trait.Iterator.html#method.step_by

instead of step_by of Range:
https://doc.servo.org/core/ops/struct.Range.html#method.step_by

This is my code:

error: use of unstable library feature 'step_by': recent addition (see issue #27741)
   --> /home/jklepatch/os/servo/components/layout/inline.rs:970:66
    |
970 |                 (range.end().get() - 1..range.begin().get() - 1).step_by(-1);
    |                                                                  ^^^^^^^
    |
    = help: add #![feature(step_by)] to the crate attributes to enable

I have also replaced #![feature(step_by)] by #![feature(iterator_step_by)] at the top of the crate. Finally, I have tried with and without a statement use std::iter:iterator; (might not be necessary, I am new to rust and dont understand very well whats going on).

Is this the normal behavior?

I’ve submitted #43077 to override nth. I think #42310 and #43012 should not have landed without this, but oh well.

I’d like to add Iterator::nth_back and use it in iter::Rev<_>::nth in order to fix the same performance issue in (a..b).rev().step_by(n). What do you all think?

Follow-up PR to redesign the Step trait and some other changes to related iterator impls for ranges: https://github.com/rust-lang/rust/pull/43127

Are there any open questions for the current Iterator::step_by? These original ones seem addressed. The question of negative steps => defined out by using the Iterator interface.

Currently step_by is probably slow ( https://github.com/rust-lang/rust/issues/43064 ), so I suggest to stabilize it only after (and if) it becomes fast enough.

Regarding the design, see also:
https://users.rust-lang.org/t/some-more-about-step-by-troubles/11975

@leonardo-m Good points, I think those are open questions. After having a go at it, I can see that trying to special case .step_by() for numeric ranges with a usize step size runs into all the unfun stuff we want to avoid in Rust. Mixing integer sizes just leads to mismatches in several ways.

  • Is .step_by(usize) appropriate for numeric ranges?

    • Implementation problems include handling too large step sizes for small integer ranges, and having no implementation for an u64 range with u64 step size

  • Can it be implemented with specialization (I assume specialization bugs will be fixed)

I think step_by is good, it's just not made for ranges. It has the same behavior as nth(). It's perfectly sensible that it behaves the way it does today.

We could simply make *By range variants, like RangeBy, RangeFromBy, etc. Maybe a RangeByArgument trait to match and make it a supertrait of RangeByArgument.

Optionally, the regular Range variants with an *By counterpart could have a by method like:
fn by(self, Idx) -> RangeBy (on Range) so you can do (1..100).by(2).

An ergonomical syntax can be decided at a later point, but at least functionality will be there.

I agree that Iterator::step_by is good, regardless of whether it is the solution for ranges. Let’s stabilize it.

@rfcbot fcp merge

The tracking issue for range (stepped) iterator is https://github.com/rust-lang/rust/issues/42168

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

  • [x] @Kimundi
  • [x] @SimonSapin
  • [x] @alexcrichton
  • [ ] @aturon
  • [x] @dtolnay
  • [x] @sfackler
  • [ ] @withoutboats

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Is there an RFC or post or something that documents the behavior being proposed for stabilization?

I find it extremely hard to extract from this issue what is exactly being proposed for stabilization, what is the rationale, alternatives that have been explored, etc.

I am not suggesting that this needs an RFC, but at least a summary of the work done and why what we have now is better and suitable for stabilization that what we had before. As a maintainer of collections implement step_by I would find such information useful to update my collections accordingly.

@gnzlbg Yeah, this issue has a lot of history and can be hard to read now. Iterator::step_by(self, n: usize) -> StepBy<Self> returns a new iterator that is based on the inner iterator’s nth method:

https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.step_by

Creates an iterator starting at the same point, but stepping by
the given amount at each iteration.

Note that it will always return the first element of the iterator,
regardless of the step given.

Panics

The method will panic if the given step is 0.

Examples

Basic usage:

#![feature(iterator_step_by)]
let a = [0, 1, 2, 3, 4, 5];
let mut iter = a.into_iter().step_by(2);

assert_eq!(iter.next(), Some(&0));
assert_eq!(iter.next(), Some(&2));
assert_eq!(iter.next(), Some(&4));
assert_eq!(iter.next(), None);

@SimonSapin thanks for the summary, that is really helpful.

I agree that step_by is good for iterators in general, but we risk a situation where we have to advise not using .step_by on ranges, which is going to be uphill ergonomically. I had hopes .step_by(usize) would be straightforward for ranges, but it's a bit complicated.

With the right tooling and right method name, a “increment type matching” stepping adaptor for ranges might still be easy to find for users.

Problems with step_by for ranges include such things as:

  • Can't use u64 steps for an u64 range
  • How do we implement a for example i16 range with usize step, in a way that is efficient, ideally “zero cost”.

    • Maybe we can do most of the work already in the constructor of the stepping adaptor for an Range<i16>. We figure out what an "equivalent" step that fits inside an i16 is and adjust the step accordingly.

Could two different named methods ("step" and step_by"?) solve this issue, having one for ranges that takes an usize, and one for intervals that takes the same type as the interval?

Can't we have the step parameter be an associated type? Then on the generic Iterator we can have it be usize and for ranges we override it with Idx? Like

trait Iterator {
    type Item;
    type Idx = usize;

    /// ...
}

impl Iterator for Range<Idx=u16> {
    type Idx = Self::Idx;

    /// ...
}

Or something similar is this doesn't work out or if it's not desirable to add a new associated type to the Iterator trait.

@ivandardi If we did that Range<u16> wouldn't implement Iterator<Item = u16> anymore.

@bluss How about renaming step_by to every or every_nth since it relies on nth. I agree the word "step" could bring confusion. The method is valuable on its own though even if it's not the solution for number ranges.

I’m ok with every_nth / EveryNth.

Semantics and naming choices like every_nth were discussed in PR #41439. I think it would be confusing to name this similar to nth, since they're off by one -- step_by(x) is the same as next() then repeating nth(x-1).

Then how about every.

:bell: This is now entering its final comment period, as per the review above. :bell:

The final comment period is now complete.

I looked at overriding StepBy::try_fold to see if it would help. It does seem to help, but the biggest thing for me was that it clarified what made it hard for LLVM: the inner loop for (a..b).step_by(7).map(|x| x ^ (x - 1)).sum() is:

    mov r8d, eax
    lea r9d, [rcx - 1]
    lea eax, [rcx - 2]
    xor eax, r9d
    add eax, r8d
    mov r9d, ecx
    add r9d, 6
    setb    r8b
    cmp r9d, edx
    jae .LBB1_6
    add ecx, 7
    test    r8b, r8b
    je  .LBB1_4

which doesn't simplify further because of the slightly-awkward semantics of nth. It needs to return the 6th item and step by 7. (So the Range iterator is doing both Step::add_usize(6) and Step::add_one().)

I think this'd be perfectly efficient in terms of a "return next and also step forward" method, since the Range iterator for that would just return start and replace it with start.add_usize(7). I dunno if there'd be an appetite for such a thing, though...

Could we amend a note to step_by allowing us to use the pre-stepping semantics that @scottmcm spoke of? Given that iteration can have side-effects, I fear we may be overconstraining ourselves with regards to range iterators. While the current side-effect free integer ranges can deal with this, RangeFrom would already be changing semantics with pre-stepping because of overflow. It could affect additional situations when the Step trait will be stabilized.

While specializations of StepBy<Range*<_>> catch the most common case performance issues, they miss all the more complicated ones where the Range is hidden within an adapter.

Was this page helpful?
0 / 5 - 0 ratings