Rust: `repeat` method for (byte) slices?

Created on 6 Mar 2018  路  23Comments  路  Source: rust-lang/rust

There exists a str::repeat method, but no equivalent for (byte) slices. This would seem like a useful feature. Should we consider adding it to the stdlib?

C-tracking-issue T-libs disposition-merge

Most helpful comment

@sfackler @withoutboats I came across this again and read through the comments to find nothing blocking stabilization other than your votes which have been absent for 9 months. It would seem prudent to comment before it is further forgotten.

All 23 comments

Possibly generalized (or not, but let's entertain the notion...)

pub trait Repeat {
    type Repeated;
    fn repeat(&self, times: usize) -> Self::Repeated;
}

Seems reasonable to add this to [T] where T: Clone.

@Centril Heh, I should expect something abstract from you! Could definitely do that, and implement it even for things like iterators, though it's a bit more work...

@sfackler Yep, that certainly wouldn't be too liberal

Doesn't the general one exist already? iter::repeat(my_into_iter).take(n).flatten().collect()

That said, https://github.com/rust-lang/rust/pull/48657 could easily be moved to [T] where T: Copy, and that way the str/String version would just wrap the slice/Vec version. That seems like a more interesting option to me, since it's taking advantage of the "slice-ness", rather than just being a wrapper around iterator methods.

@scottmcm What's the efficiency of that one like? I bet it's nothing close to memcpy, which could effectively be done by specialising on Repeat for [T] where T: Copy.

@alexreg That's what I meant by my second paragraph. slice::repeat<T:Copy>(x: &[T], n: usize) -> Vec<T> sounds great, but I don't see a need for it for iterators or T: Clone.

@scottmcm Yeah, that's certainly a fair idea. Do you know if iter::repeat(my_into_iter).take(n).flatten().collect() is as efficient as pre-allocating a Vec of the requisite size and repeatedly calling extend on it? If so, I agree with you fully. Otherwise, there may be a need for a Repeat trait.

We can implement it for T: Clone and specialize to #48657's implementation for T: Clone. None of that needs to affect the method's public API though.

@sfackler Yep, that sounds like a good solution to be honest. Even if it's no more efficient that the iterator approach for T: Clone, it would at least provide a consistent interface!

@alexreg Unfortunately I know for sure that literally .flatten().collect() is unlikely to be great (see https://github.com/rust-lang/rust/pull/48544 for a discussion of why, though that PR wouldn't help this case). But I would expect that using https://github.com/rust-lang/rust/pull/48597 .collect_into(Vec::with_capacity(my_slice.len() * n)) instead would let the iterator method do just as well as something more manual.

@scottmcm Fair enough, though I think in that case why might as well provide that as a default fn on the Repeat trait, and allow for specialisations when T: Copy (and potentially other cases too).

@rfcbot poll @rust-lang/libs I'd like to stabilize [T]::repeat.

The signature in #48999 is

fn repeat(&self, n: usize) -> Vec<T> where T: Copy;

and the implementation uses on copy_nonoverlapping which relies on the property of Copy. There's suggestion to generalize the bound to T: Clone, however I don't think this is a stabilization blocker:

  1. T: Copy is what makes slice::repeat special. If T: Clone + !Copy, the performance is probably just similar to iter::repeat(_).take(_).flatten().collect() and can be used instead.

  2. Since Copy is a subtrait of Clone, we can upgrade the bound to T: Clone without breaking backward compatibility.

OTOH it is really simple to extend the bound to T: Clone today as we could use specialization internally (as explained in https://github.com/rust-lang/rust/issues/48784#issuecomment-371387279):

// note: private trait 
trait RepeatSlice: Sized {
    fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self>;
}

impl<T: Clone> RepeatSlice for T {
    default fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self> {
        let mut res = Vec::with_capacity(n * slice.len());
        for _ in 0..n {
            res.extend_from_slice(slice);
        }
        res
    }
}

impl<T: Copy> RepeatSlice for T {
    fn repeat_slice(slice: &[Self], n: usize) -> Vec<Self> {
        // the original copy_nonoverlapping implementation
    }
}

impl<T> [T] {
    pub fn repeat(&self, n: usize) -> Vec<T> 
    where 
        T: Clone,
    {
        T::repeat_slice(self, n)
    }
}

Team member @kennytm has asked teams: T-libs, for consensus on:

I'd like to stabilize [T]::repeat.

  • [x] @Kimundi
  • [x] @SimonSapin
  • [x] @alexcrichton
  • [x] @dtolnay
  • [ ] @sfackler
  • [ ] @withoutboats

From the implementation:

        // If `n` is larger than zero, it can be split as
        // `n = 2^expn + rem (2^expn > rem, expn >= 0, rem >= 0)`.
        // `2^expn` is the number represented by the leftmost '1' bit of `n`,
        // and `rem` is the remaining part of `n`.
        // `2^expn` repetition is done by doubling `buf` `expn`-times.

I guess that it鈥檚 this way because fewer calls to copy_nonoverlapping with larger slices are assumed to be faster. This may be the case up to "medium" sizes compared to e.g. copying a single byte at a time. However at some point, going back "far away" in memory might hurt cache locality.

Do we have evidence one way or another to justify such complex code?

Is there anything blocking stabilization here? It looks like @SimonSapin's comment above here is "just" an implementation nit.

@SimonSapin The code came from str::repeat, which has some performance numbers in the PR that added this implementation method: https://github.com/rust-lang/rust/pull/48657#issuecomment-370206663 https://github.com/rust-lang/rust/pull/48657#issue-172422546

Right, my comment should not be taken as blocking stabilization. It鈥檚 purely an implementation detail and can change later.

@dtolnay @withoutboats @sfackler This pFCP awaits your checkboxes above.

@sfackler @withoutboats I came across this again and read through the comments to find nothing blocking stabilization other than your votes which have been absent for 9 months. It would seem prudent to comment before it is further forgotten.

https://github.com/rust-lang/rust/issues/48784#issuecomment-433644981 is a poll, not a pFCP. These days rfcbot starts FCP when it has not more than two check boxes missing, so I think this one should be considered accepted. Feel free to send a stabilization PR.

I guess that it鈥檚 this way because fewer calls to copy_nonoverlapping with larger slices are assumed to be faster. This may be the case up to "medium" sizes compared to e.g. copying a single byte at a time. However at some point, going back "far away" in memory might hurt cache locality.

I confirmed that run-time of m.repeat(n) with larger m or n gets worse.

m: Vec<u8>

|len(m)|n| ns / op | Ratio to n = 2 | Increase rate |
|-|-|-|-|-|
| 16 | 2 | 19 | 1.0 | |
| 16 | 4 | 24 | 1.3 | 1.3 |
| 16 | 8 | 26 | 1.4 | 1.1 |
| 16 | 16 | 29 | 1.5 | 1.1 |
| 16 | 32 | 32 | 1.7 | 1.1 |
| 16 | 64 | 37 | 1.9 | 1.2 |
| 16 | 128 | 70 | 3.7 | 1.9 |
| 16 | 256 | 91 | 4.8 | 1.3 |
| 16 | 512 | 134 | 7.1 | 1.5 |
| 16 | 1024 | 270 | 14.2 | 2.0 |
| 16 | 2048 | 548 | 28.8 | 2.0 |
| 16 | 4096 | 1482 | 78.0 | 2.7 |
| 16 | 8192 | 3137 | 165.1 | 2.1 |
| 16 | 16384 | 6592 | 346.9 | 2.1 |
| 16 | 32768 | 14912 | 784.8 | 2.3 |
| 16 | 65536 | 33446 | 1760.3 | 2.2 |
| 16 | 131072 | 69184 | 3641.3 | 2.1 |
| 16 | 262144 | 195495 | 10289.2 | 2.8 |
| 16 | 524288 | 420965 | 22156.1 | 2.2 |
| 16 | 1048576 | 1099518 | 57869.4 | 2.6 |
| 16 | 2097152 | 5504275 | 289698.7 | 5.0 |
| 16 | 4194304 | 10739572 | 565240.6 | 2.0 |
| 16 | 8388608 | 21194699 | 1115510.5 | 2.0 |
| 16 | 16777216 | 42588466 | 2241498.2 | 2.0 |
| 16 | 33554432 | 85833026 | 4517527.7 | 2.0 |
| 16 | 67108864 | 173012397 | 9105915.6 | 2.0 |
| 16 | 134217728 | 344958449 | 18155707.8 | 2.0 |
| 16 | 268435456 | 702324815 | 36964463.9 | 2.0 |
| 16 | 536870912 | 1426503968 | 75079156.2 | 2.0 |
| 128 | 2 | 21 | 1.0 | |
| 128 | 4 | 24 | 1.1 | 1.1 |
| 128 | 8 | 31 | 1.5 | 1.3 |
| 128 | 16 | 63 | 3.0 | 2.0 |
| 128 | 32 | 72 | 3.4 | 1.1 |
| 128 | 64 | 128 | 6.1 | 1.8 |
| 128 | 128 | 194 | 9.2 | 1.5 |
| 128 | 256 | 536 | 25.5 | 2.8 |
| 128 | 512 | 1487 | 70.8 | 2.8 |
| 128 | 1024 | 3144 | 149.7 | 2.1 |
| 128 | 2048 | 6602 | 314.4 | 2.1 |
| 128 | 4096 | 15307 | 728.9 | 2.3 |
| 128 | 8192 | 34233 | 1630.1 | 2.2 |
| 128 | 16384 | 69279 | 3299.0 | 2.0 |
| 128 | 32768 | 200694 | 9556.9 | 2.9 |
| 128 | 65536 | 420488 | 20023.2 | 2.1 |
| 128 | 131072 | 1096523 | 52215.4 | 2.6 |
| 128 | 262144 | 5527507 | 263214.6 | 5.0 |
| 128 | 524288 | 10733631 | 511125.3 | 1.9 |
| 128 | 1048576 | 21219285 | 1010442.1 | 2.0 |
| 128 | 2097152 | 42514794 | 2024514.0 | 2.0 |
| 128 | 4194304 | 85295812 | 4061705.3 | 2.0 |
| 128 | 8388608 | 170770953 | 8131950.1 | 2.0 |
| 128 | 16777216 | 342405831 | 16305039.6 | 2.0 |
| 128 | 33554432 | 698031506 | 33239595.5 | 2.0 |
| 128 | 67108864 | 1420857974 | 67659903.5 | 2.0 |
| 256 | 2 | 22 | 1.0 | |
| 256 | 4 | 31 | 1.4 | 1.4 |
| 256 | 8 | 68 | 3.1 | 2.2 |
| 256 | 16 | 69 | 3.1 | 1.0 |
| 256 | 32 | 126 | 5.7 | 1.8 |
| 256 | 64 | 191 | 8.7 | 1.5 |
| 256 | 128 | 536 | 24.4 | 2.8 |
| 256 | 256 | 1251 | 56.9 | 2.3 |
| 256 | 512 | 2655 | 120.7 | 2.1 |
| 256 | 1024 | 5754 | 261.5 | 2.2 |
| 256 | 2048 | 13505 | 613.9 | 2.3 |
| 256 | 4096 | 31159 | 1416.3 | 2.3 |
| 256 | 8192 | 64759 | 2943.6 | 2.1 |
| 256 | 16384 | 179837 | 8174.4 | 2.8 |
| 256 | 32768 | 404020 | 18364.5 | 2.2 |
| 256 | 65536 | 1083392 | 49245.1 | 2.7 |
| 256 | 131072 | 5523028 | 251046.7 | 5.1 |
| 256 | 262144 | 10702046 | 486456.6 | 1.9 |
| 256 | 524288 | 21279790 | 967263.2 | 2.0 |
| 256 | 1048576 | 42513553 | 1932434.2 | 2.0 |
| 256 | 2097152 | 85352125 | 3879642.0 | 2.0 |
| 256 | 4194304 | 171699467 | 7804521.2 | 2.0 |
| 256 | 8388608 | 342988128 | 15590369.5 | 2.0 |
| 256 | 16777216 | 696892564 | 31676934.7 | 2.0 |
| 256 | 33554432 | 1422522241 | 64660101.9 | 2.0 |
| 1024 | 2 | 31 | 1.0 | |
| 1024 | 4 | 64 | 2.1 | 2.1 |
| 1024 | 8 | 79 | 2.5 | 1.2 |
| 1024 | 16 | 128 | 4.1 | 1.6 |
| 1024 | 32 | 203 | 6.5 | 1.6 |
| 1024 | 64 | 419 | 13.5 | 2.1 |
| 1024 | 128 | 1229 | 39.6 | 2.9 |
| 1024 | 256 | 2688 | 86.7 | 2.2 |
| 1024 | 512 | 5720 | 184.5 | 2.1 |
| 1024 | 1024 | 12629 | 407.4 | 2.2 |
| 1024 | 2048 | 28684 | 925.3 | 2.3 |
| 1024 | 4096 | 59888 | 1931.9 | 2.1 |
| 1024 | 8192 | 149524 | 4823.4 | 2.5 |
| 1024 | 16384 | 323578 | 10438.0 | 2.2 |
| 1024 | 32768 | 886780 | 28605.8 | 2.7 |
| 1024 | 65536 | 2105535 | 67920.5 | 2.4 |
| 1024 | 131072 | 8995326 | 290171.8 | 4.3 |
| 1024 | 262144 | 17833419 | 575271.6 | 2.0 |
| 1024 | 524288 | 35786350 | 1154398.4 | 2.0 |
| 1024 | 1048576 | 70596109 | 2277293.8 | 2.0 |
| 1024 | 2097152 | 140652832 | 4537188.1 | 2.0 |
| 1024 | 4194304 | 282462549 | 9111695.1 | 2.0 |
| 1024 | 8388608 | 599121203 | 19326490.4 | 2.1 |

@sinkuu Would you be interested in working on a PR to improve the implementation?

@SimonSapin Yes, I'll try to add the fallback to the old impl for large parameters.

It may be worth looking at the number of bytes being copied per loop iteration, rather than simply the number of repetitions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

modsec picture modsec  路  3Comments

Robbepop picture Robbepop  路  3Comments

drewcrawford picture drewcrawford  路  3Comments

behnam picture behnam  路  3Comments

tikue picture tikue  路  3Comments