Rust: Tracking issue for chunks_exact/_mut; slice chunks with exact size

Created on 2 Jan 2018 · 61Comments · Source: rust-lang/rust

This is inspired by ndarray and generally seems to allow llvm to remove more bounds checks in the code using the iterator (because the slices will always be exactly the requested size), and doesn't require the caller to add additional checks.

A PR adding these for further discussion will come in a bit.

Implemented in #47126

Open questions:

[x] Should the new iterators panic if the slice is not divisible by the chunk_size, or omit any leftover elements.
The latter is implemented right now and very similar to how zip works and @shepmaster even argues that without this, this iterator is kind of useless and the optimization should be implemented as part of the normal chunks iterator (which seems non-trivial, see ).
Omission of leftover elements is also how this iterator is implemented in ndarray (but far more general).
~~~For the leftover elements a function could be provided, see https://github.com/rust-lang/rust/pull/51339 which implements that.~~~
A function for getting access to the remainder exists on the iterator, similar to how slice::Iter and slice::IterMut give access to the tail (note: the remainder are the odd elements that don't completely fill a chunk, it's not the tail!)
The majority of people (who spoke up here) seem to prefer the non-panicking behaviour

[x] Should it be called exact_chunks or chunks_exact? The former is how it's called in ndarray, the latter is potentially more discoverable in e.g. IDEs.
It was renamed to chunks_exact

B-unstable C-tracking-issue T-libs disposition-merge finished-final-comment-period

Source

sdroege

👍2

Most helpful comment

Example can be found here

The relevant part with differences in the assembly is

before:

.LBB4_24:
  cmp r11, 4
  mov eax, 4
  cmovb rax, r11
  test rbx, rbx
  je .LBB4_18
  cmp rbx, 4
  mov edx, 4
  cmovb rdx, rbx
  test r13, r13
  je .LBB4_18
  mov qword ptr [rbp - 96], rax
  mov qword ptr [rbp - 48], rsi
  mov qword ptr [rbp - 56], r9
  cmp r11, 3
  jbe .LBB4_27
  mov qword ptr [rbp - 96], rdx
  mov qword ptr [rbp - 48], rsi
  mov qword ptr [rbp - 56], r9
  cmp rbx, 3
  jbe .LBB4_29
  cmp rax, 1
  je .LBB4_39
  lea r10, [r13 + rax]
  sub r11, rax
  lea r12, [r15 + rdx]
  sub rbx, rdx
  cmp rax, 3
  jb .LBB4_41
  je .LBB4_42
  movzx r14d, byte ptr [r13]
  movzx r8d, byte ptr [r13 + 1]
  movzx eax, byte ptr [r13 + 2]
  imul r13d, eax, 19595
  imul edi, r8d, 38470
  imul eax, r14d, 7471
  add eax, edi
  add eax, r13d
  shr eax, 16
  mov byte ptr [r15], al
  cmp rdx, 1
  je .LBB4_44
  mov byte ptr [r15 + 1], al
  cmp rdx, 3
  jb .LBB4_45
  mov byte ptr [r15 + 2], al
  je .LBB4_46
  mov byte ptr [r15 + 3], 0
  test r11, r11
  mov r15, r12
  mov r13, r10
  jne .LBB4_24

after:

.LBB5_18:
  test rsi, rsi
  je .LBB5_20
  add rdx, -4
  movzx r10d, byte ptr [rsi]
  movzx eax, byte ptr [rsi + 1]
  movzx ebx, byte ptr [rsi + 2]
  lea rsi, [rsi + 4]
  imul r13d, ebx, 19595
  imul eax, eax, 38470
  imul ebx, r10d, 7471
  add ebx, eax
  add ebx, r13d
  shr ebx, 16
  mov byte ptr [rcx], bl
  mov byte ptr [rcx + 1], bl
  mov byte ptr [rcx + 2], bl
  mov byte ptr [rcx + 3], 0
  lea rcx, [rcx + 4]
  cmp rdx, 4
  jae .LBB5_18

sdroege on 2 Jan 2018

👍3

All 61 comments

Example can be found here

The relevant part with differences in the assembly is

before:

.LBB4_24:
  cmp r11, 4
  mov eax, 4
  cmovb rax, r11
  test rbx, rbx
  je .LBB4_18
  cmp rbx, 4
  mov edx, 4
  cmovb rdx, rbx
  test r13, r13
  je .LBB4_18
  mov qword ptr [rbp - 96], rax
  mov qword ptr [rbp - 48], rsi
  mov qword ptr [rbp - 56], r9
  cmp r11, 3
  jbe .LBB4_27
  mov qword ptr [rbp - 96], rdx
  mov qword ptr [rbp - 48], rsi
  mov qword ptr [rbp - 56], r9
  cmp rbx, 3
  jbe .LBB4_29
  cmp rax, 1
  je .LBB4_39
  lea r10, [r13 + rax]
  sub r11, rax
  lea r12, [r15 + rdx]
  sub rbx, rdx
  cmp rax, 3
  jb .LBB4_41
  je .LBB4_42
  movzx r14d, byte ptr [r13]
  movzx r8d, byte ptr [r13 + 1]
  movzx eax, byte ptr [r13 + 2]
  imul r13d, eax, 19595
  imul edi, r8d, 38470
  imul eax, r14d, 7471
  add eax, edi
  add eax, r13d
  shr eax, 16
  mov byte ptr [r15], al
  cmp rdx, 1
  je .LBB4_44
  mov byte ptr [r15 + 1], al
  cmp rdx, 3
  jb .LBB4_45
  mov byte ptr [r15 + 2], al
  je .LBB4_46
  mov byte ptr [r15 + 3], 0
  test r11, r11
  mov r15, r12
  mov r13, r10
  jne .LBB4_24

after:

.LBB5_18:
  test rsi, rsi
  je .LBB5_20
  add rdx, -4
  movzx r10d, byte ptr [rsi]
  movzx eax, byte ptr [rsi + 1]
  movzx ebx, byte ptr [rsi + 2]
  lea rsi, [rsi + 4]
  imul r13d, ebx, 19595
  imul eax, eax, 38470
  imul ebx, r10d, 7471
  add ebx, eax
  add ebx, r13d
  shr ebx, 16
  mov byte ptr [rcx], bl
  mov byte ptr [rcx + 1], bl
  mov byte ptr [rcx + 2], bl
  mov byte ptr [rcx + 3], 0
  lea rcx, [rcx + 4]
  cmp rdx, 4
  jae .LBB5_18

sdroege on 2 Jan 2018

👍3

I don't want to derail your discussion too much. Const generics and value-level chunks both have their uses. I'm reminded of this existing implementation of the "const" kind of chunking, in this case in an iterator that actually allows access to the whole blocks and then the uneven tail at the end: BlockedIter. Note that a Block<Item=T> is an array of T.

bluss on 2 Jan 2018

Interesting, thanks for mentioning that. For my use case that would probably work more or less the same way, but it's slightly different indeed.

sdroege on 2 Jan 2018

BlockedIter was developed while looking at exactly the hand off between the blocks and the elementwise tail; the idea was to avoid some of the loss that otherwise shows up in code that converts between slices and slice iterators. In this case it's the same pointer being bumped through the whole iteration.

bluss on 2 Jan 2018

The chunks iterators are a good candidate for zip specialization (TrustedRandomAccess trait)

bluss on 2 Jan 2018

True. I'll add that in a bit, as a separate PR for the existing chunked iterators and as a separate commit for the new ones.

sdroege on 2 Jan 2018

I forgot to add some benchmark results earlier. This is with the code from https://github.com/rust-lang/rust/issues/47115#issuecomment-354715511 and running on a 1920*1080*4 byte slice. Basically 2.46x as fast.

running 2 tests
test tests::bench_with_chunks       ... bench:   7,702,902 ns/iter (+/- 177,747)
test tests::bench_with_exact_chunks ... bench:   3,132,468 ns/iter (+/- 202,032)

sdroege on 3 Jan 2018

👍2

The latter is implemented right now and very similar to how zip works and @shepmaster even argues that without this, this iterator is kind of useless and the optimization should be implemented as part of the normal chunks iterator

Thinking of this from another direction, if either Rust or LLVM were to magically figure out how to perform this optimization in the case of chunks but chunks_exact didn't perform the length truncation, we'd then have two functions in the standard library that did the exact same thing. This would be annoying as a consumer.

shepmaster on 13 Jan 2018

👍1

This shouldn't have been closed by the merge, can someone reopen it? I don't have the permissions for that it seems

sdroege on 15 Jan 2018

The libs team discussed this and the consensus was to stabilize this with the methods panicking before returning an iterator if the slice’s length is not a multiple of the requested chunk size. This is consistent with e.g. [T]::copy_from_slice panicking on unequal sizes. Callers can take a sub-slice before calling these methods if they wish to ignore extra items.

@rfcbot fcp merge

SimonSapin on 30 Mar 2018

👍1

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

[x] @Kimundi
[x] @SimonSapin
[x] @alexcrichton
[ ] @aturon
[x] @dtolnay
[x] @sfackler
[ ] @withoutboats

Concerns:

~~panicking~~ resolved by https://github.com/rust-lang/rust/issues/47115#issuecomment-430761084

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot on 30 Mar 2018

😕1

I can provide a PR for panicking if the size does not match, as long as everybody agrees that this is the desired behaviour. @shepmaster did not agree before, and to me it also seems suboptimal but it does not matter for any of my use-cases and code can easily work either way.

sdroege on 30 Mar 2018

Yup, I still disagree with that decision. I literally used this function in the last week for the purposes of SIMD work (chunking of 16 bytes). Having to perform the modulo operation multiple times feels counterproductive.

I'll point back at my previous comment that this decision means that this function is basically only created for the purposes of codegen and will become useless if/when chunks produces the same code.

shepmaster on 30 Mar 2018

Alright, let’s hold off for now:

@rfcbot concern panicking

@shepmaster Is silently ignoring extra element the desired behavior? Could you explain why that is, and why there’s no (or it’s worth the) risk of bugs due to failure to process those extra elements?

Does it sound like a good idea to return, instead of just an iterator, a tuple of an iterator and a slice for those extra elements?

SimonSapin on 30 Mar 2018

Is silently ignoring extra element the desired behavior? Could you explain why that is, and why there’s no (or it’s worth the) risk of bugs due to failure to process those extra elements?

It's desired and non-buggy, but only because I am already handling those extra elements around the code I would been using exact_chunks in.

a tuple of an iterator and a slice for those extra elements?

For my particular example, that would have been close to ideal. I have a byte string of arbitrary length and I need to chunk it into 16-byte slices and then specially handle the leftovers at the end. However, I also need to track the total number of bytes of the chunked amount to report back the full offset at the end.

In this SIMD case, even using slices at all generated unneeded assembly (tracking the length of each 16-element slice added ~2 operations and took a register in a hot loop, for example), so I don't know if exact_chunks would be ultimately usable in my case anyway. Thus I don't know if my opinion is extremely valid.

Although the tuple idea seems nice on the surface, it feels awkward to use because you have to unpack the iterator:

let (chunks, _) = a.exact_chunks(16);
for c in chunks {}
// or
for c in a.exact_chunks(16).0 {}

And if you want the current panic behavior, it's not better:

let (chunks, extra) = a.exact_chunks(16);
assert!(extra.is_empty(), "Have extra data");
for c in chunks {}

So maybe there's no ergonomic solution other than panicking...?

The only other thing that comes to mind is an iterator with an inherent method:

// Don't care about extra
for c in a.exact_chunks(16) {}

// Shouldn't have extra
let mut chunks = a.exact_chunks(16); 
assert!(chunks.extra().is_empty(), "Had extra things");
for c in &mut chunks {}

// Handle extra
let mut chunks = a.exact_chunks(16); 
for c in &mut chunks {}
for b in chunks.extra() {}

shepmaster on 30 Mar 2018

👍2

Having to perform the modulo operation multiple times feels counterproductive.

Why multiple times, though?

Extra items at the end of the slice:

// Unsigned integer division rounds toward zero / truncates
let (chunkable, extra) = slice.split_at(slice.len() / chunk_size * chunk_size)

Extra items at the start of the slice:

let (extra, chunkable) = slice.split_at(slice.len() % chunk_size)

Either way, continue with something like:

for item in extra {
    process_one(item)
}
for chunk in chunkable.exact_chunks(chunk_size) {
    process_many(chunk)
}

In this SIMD case, even using slices at all generated unneeded assembly

Are you saying we should wait for const generics to be stable and change these iterators to yield fixed-size arrays rather than slices? And not stabilize anything in the meantime?

SimonSapin on 1 Apr 2018

Are you saying we should wait for const generics to be stable and change these iterators to yield fixed-size arrays rather than slices? And not stabilize anything in the meantime?

Const generics don't cover all my use-cases. The chunk size is not always known at compile-time. IMHO we need another function for this later with const generics.

The version that returns the remaining items as a slice together with the iterator seems fine to me.

sdroege on 1 Apr 2018

Why multiple times, though?

Because exact_chunks will perform the same modulo operation.

const generics to be stable and change these iterators to yield fixed-size arrays

I can't have said that because I didn't even realize that might be a possibility. 😜 That certainly sounds ideal for my case though. I might even give it a shot with the non-generic form...

not stabilize anything in the meantime

I don't want to be The Bad Guy that prevents things from stabilization. If there are clear wins for the function as it is today, it seems reasonable. I think I'd lean towards the inherent method on the iterator to retrieve the trailing extra bits of all the previous alternatives.

shepmaster on 1 Apr 2018

I think I'd lean towards the inherent method on the iterator to retrieve the trailing extra bits of all the previous alternatives.

From what I understood the concern with the current behaviour is discoverability. That it's not necessarily obvious that the remaining items are simply ignored (which is btw consistent with how zip works for example, so I don't really see the problem here). An inherent method on the iterator is also not very discoverable, at least for myself. I usually don't look at the inherent methods of a specific iterator, assuming there are none.

sdroege on 1 Apr 2018

the current behaviour is discoverability

I suppose panicking is very discoverable and hard to miss, but you still need to look at the docs (or hopefully the error message!) to understand why the panic occurred.

I usually don't look at the inherent methods of a specific iterator, assuming there are none

There's goodies to be found in those iterators, especially when looking for performance wins. Chars::as_str comes to mind.

This is probably something that will get better over time because things will start returning impl Trait when they have nothing extra. Having a concrete type will be a sign that something interesting is present.

shepmaster on 1 Apr 2018

I propose the chunked iters are rewritten with raw pointers, and that they provide an iterator to the rest / the tail using a method that returns a slice iterator. That should be the best possible we can do for tight code. Overly long explanation below.

No reason to implement that now or before stabilization, but I want to say I don't want to lock us in on returning a tail slice or a pair of iterator and a slice.

Returning the remainder as a slice is not entirely zero cost, in all situations. That's the kind of situation I was exploring with BlockedIter @ crate odds.

We can iterate in chunks, then creating a slice of the remaining tail, iterate the tail.

If we write that straightforwardly, the compiler cannot eliminate all the iter-to-slice-to-iter conversions and produces overly complicated code. It is not zero cost, because we can write better code with raw pointers.

The BlockedIter does that better, because you can iterate blocks, and then you get a slice iterator to the remaining tail (see method BlockedIter::tail). The difference is that with the blocked iterator + tail, we have the same pointer being bumped through both the block loop and on through the tail loop. That's all much better (as of the development of BlockedIter, which was long before llvm 6 in Rust.)

BlockedIter is formulated to be something like the const generics version, but the same applies both to constant and dynamic size blocks.

bluss on 1 Apr 2018

❤1

@bluss is it your belief that if we did that rewrite then chunks would generate the nice optimized assembly in the earlier comment?

I would believe it as that's basically what I did for my SIMD case. If so, then I'd rather see chunks rewritten.

shepmaster on 2 Apr 2018

If that's the consensus, I can spend some time on that. However chunks has the disadvantage that it does not guarantee you that each chunk is of equal size and it has to be checked manually, either beforehand or on every chunk.

sdroege on 3 Apr 2018

Just to make it more clear for users the exact_chunks_mut method must follow the exact_chunks one, in term of position in the documentation/declaration.

Like split and split_mut, rsplit and rsplit_mut and others...

Kerollmops on 28 May 2018

👍1

Just to make it more clear for users the exact_chunks_mut method must follow the exact_chunks one, in term of position in the documentation/declaration.

Thanks, done here https://github.com/rust-lang/rust/pull/51151

Generally, how should we move forward with this API? I think even independent of the optimization part (if it can ever be made to work with the normal chunks iterator) this API is useful to have as it can be more convenient. As long as there's an iterator over a guaranteed length plus the remainder.

If I understand @bluss's comment correctly, the idea would be to implement exact_chunks as an iterator that has an associated method for returning the tail at every iteration and at the very end this would point at only the remainder and the remainder is never ever returned as it would be in the normal chunks. Correct?

This method could also be added to the existing chunks iterator, it does not "cost" anything (the iterator already carries forward the tail internally anyway) and might be convenient.

If that's something everybody can get behind I would implement it, benchmark again against my original testcase, and also check specifically if raw pointers are needed or splitting slices is enough (either way does not make much of a difference implementation-wise).

sdroege on 29 May 2018

Returning the remainder as a slice is not entirely zero cost, in all situations.

@bluss Can you give an example? Currently all 4 chunks iterators are already storing the remainder as a slice inside their struct, so simply adding a way to return that should not make any difference. As such I assume you meant that with the current implementation this is already problematic in some cases?

EDIT: For the exact_chunks variants having a way to return the tail requires to keep around the whole initial slice in one way or another. Currently the last elements are dropped inside the iterator on construction to simplify other code (most importantly the next_back implementation). So either the division/modulo calculation has to be moved in there, or two slices have to be carried around all the time, or it would simply have to be implemented based on raw pointers (with keeping the original and truncated end of the slice around as a pointer).

sdroege on 30 May 2018

I personally find it very useful to do it without a panic (if the last element doesn't fit in the size, discard it). My use-case is a SVG parsing: library:

for &[x, y] in parameters.exact_chunks(2) {
    builder.svg_event(SvgEvent::MoveTo(TypedPoint2D::new(x, y)));
}

For me, there's no point in having "half a point", i.e. only the x coordinate. There's little I can do with it, and it's technically not an error. I can also imagine that it may also be faster than the panicking version (due to branch omissions), but I'm not a pro at that.

fschutt on 31 May 2018

I think that a method on the iterator adapter (i.e. ExactChunks) returning the tail/remaining of the slice after having been chunked is the way that could reconcile every one. Implementation can be benchmarked: raw pointers or two slices.

I prefer the _two slices_ option, but it's true that the first one can be implemented with only 3 pointers: one for the start of the next chunk, one for the start of the remaining part that is also the one to detect the end of the chunks part and the last one for the end of the remaining part.

FYI a slice (&[_]) has a size of 16 bytes on a 64bit machine, so the 3 pointers can be 1 byte smaller than the 2 slices implementation.

struct ExactChunks<T> {
    chunk_size: usize,
    chunk_start: *const T,
    rem_start: *const T,
    rem_end: *const T, // can be replaced by the length of the remaining part
}

An example usage of the remaining method:

let s = &[1, 2, 3, 4, 5];

let mut chunks = s.exact_chunks(2);

// the `remaining` method gives the part of the slice
// that will not be iterated over by the `ExactChunks` iterator.
let rem = chunks.remaining();
assert_eq!(rem, &[5]);

assert_eq!(chunks.next(), Some(&[1, 2]));
assert_eq!(chunks.next(), Some(&[3, 4]));
assert_eq!(chunks.next(), None);

Kerollmops on 31 May 2018

Above a function for getting the current tail at any point was discussed, you have one for the remainder. Not sure which one would be more useful, but the first is more generic at least

sdroege on 31 May 2018

Oh and the reason why I mentioned raw pointers here is that if you want to implement a function for getting the actual tail, you need to store both the tail used for iteration (potentially excluding the last few elements) and the real tail (including all remaining elements) and update both in sync. By using pointers you only need to store current position and the two ends in one way or another and only have to update the current position.

sdroege on 1 Jun 2018

👍1

One thing to notice is that having the current tail at any point can be achieved by chaining the remaining as I ear it (the part that will never be reached by the ExactChunks iterator) with a FlatMap of the iterator, like so (but without the chaining of the remaining part).

Another solution could be to have both methods, it will be a free implementation, no new pointers needed.

Not sure which one would be more useful, but the first is more generic at least

What do you mean by "more generic" ?

You are right, the 3 pointers way can be a gain in performance and code clarify.

Kerollmops on 1 Jun 2018

What do you mean by "more generic" ?

You proposed to add a function to get the final remainder (e.g. exact_chunks(4) on 6 elements gives you always the last two as remainder), while what was mentioned before was a function to get the tail at any point in time (e.g. exact_chunks(4) on 6 elements would first give you all as tail, then after the first next() give you the last 2 elements).

The latter seems to be usable in more situations and gives you the former at the end of the iteration anyway.

sdroege on 1 Jun 2018

I agree, if someone needs the remaining part (the one that will never be iterated over) the only action to do is to reach the end of the ExactChunks iterator (i.e. by using Iterator::last) and call the remaining method.

I prefer the behavior you are talking about ! It's more generic like you said !

What is missing for that to be implemented ?

Kerollmops on 1 Jun 2018

What is missing for that to be implemented ?

Time, I've started locally but got sidetracked with other stuff that needed more urgent attention. I'll try finishing it (and moving to raw pointers for the above mentioned reasons) this weekend.

sdroege on 1 Jun 2018

❤2

There's some unintuitive behaviour with having a function to return the tail and next_back. E.g. if you iterate with exact_chunks(2) on [0, 1, 2, 3, 4], you would first get everything as tail. next_back gives you [2, 3], but what would you expect tail to return then? [0, 1] (start to end of iterator), [0, 1, 2, 3, 4] (start to real end of the slice, i.e. next_back has no effect on the tail and only next has), or something else?

edit: FWIW, next_back having no effect on the tail would seem most consistent to me

edit 2: I'll come up with a reasonable implementation of all this and the details can then be discussed in the corresponding PR. There are some more smaller problems here (e.g. for the mutable iterator we don't want to be able to have multiple multiple references to the same elements via the tail)

sdroege on 3 Jun 2018

👍1

See https://github.com/rust-lang/rust/pull/51339

sdroege on 4 Jun 2018

❤2

Why ExactChunksMut does not have a remainder method that gives the same thing that into_remainder gives but immutably ? It could be really useful, the user must drop the &[T] to be able to continue iterating. Adding it to the IterMut method could be interesting.

What do you think about that ? Am I wrong with borrowing rules here ?

Kerollmops on 12 Jun 2018

Why ExactChunksMut does not have a remainder method that gives the same thing that into_remainder gives but immutably ?

next() return value has the lifetime of the underlying slice, so you need to make it impossible to call next() while the remainder is borrowed (no matter if mutable or immutable), e.g. by borrowing with the lifetime of the iterator instead of the underlying slice. That however is inconsistent with how any of the other slice iterator functions work.

sdroege on 12 Jun 2018

👍2

maybe rename exact_chunks to chunks_exact, because there is another fn reserve_exact. This will make the API more accessible from IDE or editor, since you can just type chunks to discover chunks_exact, but with the original name, you can only discover it with docs

cloudhan on 6 Jul 2018

👍1

Can panicking be later changed to non-panicking? Does that count as a breaking change?

kornelski on 29 Jul 2018

For my use cases panicking is fine. I do things like .chunks(2) and then unconditionally use c[0],c[1], so my code would panic anyway.

I strongly dislike .zip()'s silent truncation. It has kept some bugs hidden in my code. Potential undetected off-by-one truncations sound pretty bad too.

For efficient iteration and zipping it's already best practice to explicitly take slices in the way that LLVM can "see" them. So for working with buffers with inexact length it'd make sense to explicitly slice them (otherwise the code wouldn't be obvious how exact chunks work on inexact buffers). I'm expecting that chunks' own modulo would be optimized out if it was redundant.

kornelski on 29 Jul 2018

Currently there is no panicking and it behaves like zip and others and silently "ignores" anything that does not fit, but you can get the remainder via API on the iterator type.

I see your argument for not silently ignoring such mismatches but it's already common in other iterator functions and also in other languages, and from what I know it didn't cause many problems in practice after all.

For efficient iteration and zipping it's already best practice to explicitly take slices in the way that LLVM can "see" them. So for working with buffers with inexact length it'd make sense to explicitly slice them (otherwise the code wouldn't be obvious how exact chunks work on inexact buffers). I'm expecting that chunks' own modulo would be optimized out if it was redundant.

That's unfortunately not the case currently, llvm can't optimize that away for chunks but can for exact_chunk (but even apart from optimization, I think exact_chunks has a use-case due to its different behaviour which at least for me is more often the required behaviour).

sdroege on 29 Jul 2018

So, how should we move forward with this?

@cloudhan was suggesting to rename it from exact_chunks to chunks_exact. I personally don't mind either way, I chose exact_chunks because that's how it is called in ndarray.
and there's still the question about silently truncating the iterator (like zip) or not. Personally I think truncating (and giving access to the remainder as we have now) is the behaviour I would expect in relation to how other iterators are working.

sdroege on 19 Sep 2018

❤2

Not sure if this is the right place, but I'd like to see exact_chunks stabelized.

My use case is writing the algorithm discussed by Andrei Andrei Alexandrescu in
https://www.youtube.com/watch?v=o4-CwDo2zpg (start from 40:00), which converts a sequence of bytes into a number.

In his talk he discusses the speed of the original atoi() function, and the one he came up with. He also mentions that unrolling his version made it faster.

I rewrote his version in Rust, but instead of using pointers, I went with slices. Unrolling the main loop was easy to write with .exact_chunks() combined with a pattern match on the slice that exact_chunks returns on each Iteration.

With unrolling there is mostly a fixup loop, for which I could use the remainder function of exact_chunks.

Here a gist of how it looks like: https://gist.github.com/DutchGhost/9f4488f19c3555426c15c28bd854e037

DutchGhost on 21 Sep 2018

👍2

I think I've changed minds now and would be ok if these iterators don't panic. If we choose to not panic I think it makes sense to have an inherent method on the iterator to get the rest of the slice out, and that would do the necessary modulo computation if necessary.

For naming I might advocate going to chunks_exact if only to avoid clashing with ndarray, trying to head off those regressions we'd get from stabilizing!

alexcrichton on 24 Sep 2018

❤1

Let's go for that then: https://github.com/rust-lang/rust/pull/54537

sdroege on 24 Sep 2018

Recently I found myself wanting to iterate over slice in chunks, but starting from the end of the slice, towards the beginning.

for example, I wanted to Iterate [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] in chunks of 4, starting at the end. I'd like to get [8, 9, 10, 11], [4, 5, 6, 7], and as the remainder, [1, 2, 3].

Currently

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11].chunks_exact(4).rev()

gives [5, 6, 7, 8], [1, 2, 3, 4], and its remainder is [9, 10, 11].

So in the current way, I'd have to figure out what the remainder would be, split on an index to get [1, 2, 3], and pass in [4, 5, 6, 7, 8, 9, 10, 11] to .chunks_exact(4).

How would

rchunks_exact(n)

workout, where the Iterator starts at the end of the slice, going in chunks of n towards the beginning, and the remainder would give a slice from the beginning (if there is a remainder)?

DutchGhost on 25 Sep 2018

@DutchGhost That (rchunks_exact) wouldn't be very difficult to implement, but IMHO should come together with a rchunks, rchunks_mut, rchunks_exact_mut for consistency.

I'll work on it but let's handle that as a separate issue to not block stabilization of chunks_exact by concerns for that new API :)

PR can be found here https://github.com/rust-lang/rust/pull/54580

sdroege on 25 Sep 2018

👍1

With that done, how do we go from here to stabilization?

sdroege on 26 Sep 2018

The next step would be FCP, but this issue is already in FCP. It has a blocking concern that would need to be resolved by @SimonSapin to make progress (assuming there's consensus about what to do with panicking)

alexcrichton on 26 Sep 2018

Ok we discussed this a bit at libs triage, and the conclusion is that we'd like to recheck that the concerns with the original panicking API are resolved with today's implementation. To recap, today's implementation (on nightly) doesn't panic if the slice doesn't have an exact multiple of the length, but rather it's silently ignored. There are inherent methods on each iterator, though, to pull out the remainder.

@shepmaster does this resolve your original concern?

Or others as well, any opposition to having the current semantics be stabilized?

alexcrichton on 5 Oct 2018

@shepmaster does this resolve your original concern?

I am happy with the current external behavior of the function.

shepmaster on 6 Oct 2018

👍1

Ok great! @SimonSapin can you @rfcbot resolve panicking ?

alexcrichton on 6 Oct 2018

Are there any new concerns? @SimonSapin

sdroege on 17 Oct 2018

@rfcbot resolve panicking

SimonSapin on 17 Oct 2018

Ping checkbox @aturon, @sfackler, or @withoutboats

SimonSapin on 17 Oct 2018

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot on 17 Oct 2018

Ok! It's been quite awhile here so I think it's ok to shirt circuit the FCP slightly, @sdroege want to send the stabilization PR?

alexcrichton on 18 Oct 2018

Should we consider https://github.com/rust-lang/rust/pull/54580 to be "trivially enough" similar to this to stabilize at the same time?

SimonSapin on 18 Oct 2018

@sdroege want to send the stabilization PR?

Yeah, preparing a PR now. I'll add rchunks in a separate commit into the same PR if it's agreed that it should be part of that. But I guess first of all a review of #54580 should be done.

sdroege on 18 Oct 2018

There's now https://github.com/rust-lang/rust/pull/55178 for the stabilization of this here (but not rchunks).

sdroege on 18 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings