Go: proposal: sort: add the Dedup function

Created on 21 Sep 2020 · 5Comments · Source: golang/go

There is a small suggestion for extending the sort package. There are tasks where you need to remove duplicates from a slice. Yes, on the one hand we can use the map. But the inconvenience is that you have to do this for each separate type in the program. We have a function for sorting a slice, searching by a slice. Why don't we have a slice deduplication feature?

One could suggest doing by analogy with sort.Slice:

func Slice(slice interface{}, less func(i, j int) bool) {

or with sort.Search:

func Search(n int, f func(int) bool) int {

Conceptually, the solution might look like this:

func Dedup(slice interface{}, eq func(int, int) bool) { // Dedup in sort package.
    ptr := reflect.TypeOf(slice)
    if ptr.Kind() != reflect.Ptr && ptr.Elem().Kind() != reflect.Slice {
        panic("obj is not a slice")
    }

    v := reflect.ValueOf(slice)
    if v.Kind() == reflect.Ptr {
        v = v.Elem()
    }

    l := v.Len()

    // Fast path for slices of size 0 and 1. Nothing to deduplicate.
    if l <= 1 {
        return
    }

    next := 0
    for i := 0; i < l; i++ {
        if i < l-1 && eq(i, i+1) {
            continue
        }

        v.Index(next).Set(v.Index(i))
        next++
    }

    v.Set(v.Slice(0, next))
}

func main() {
    // With simple type
    input := []int{1, 1, 1, 3, 5, 5, 7, 42}
    sort.Dedup(&input, func(i int, i2 int) bool {
        return input[i] == input[i2]
    })

    // With complex type (we check only A field)
    input2 := []ComplexType{{A: 1, B: 1}, {A: 1, B: 2}, {A: 2, B: 2}}
    sort.Dedup(&input2, func(i int, i2 int) bool {
        return input2[i].A == input2[i].A
    })
}

Solution inspired https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup

The change may not affect the runtime of the language, be compatible with version 1. Can we consider this option?

If so, I will prepare a real CL.

cc @davecheney

Proposal

Source

rokkerruslan

👍5

Most helpful comment

I'm not convinced this should be tied up with sorting at all.
Yes, if you sort first then you can remove more duplicates.
But the removing duplicates operation can make sense even on unsorted data (without a sort step).

I expected that this would be slices.Uniq once generics happen.

rsc on 23 Sep 2020

👍2

All 5 comments

This should probably be named 'Uniq' for the posix uniq command, and we had best wait until we get generics in Go. Then this will be trivial to implement.

beoran on 21 Sep 2020

👍1

Thanks for the proposal.

It's not obvious to me that sort.Interface is always the right approach here--you don't use it in your example code. So it's not clear to me that this is a great fit for the existing sort package.

Also, https://golang.org/doc/faq#x_in_std. This can be written outside of the standard library, so is it a common enough operation that it ought to be in the standard library? Can you point to some examples of code that would use this if it were available?

I also tend to agree with @beoran that this ought to wait for generics.

Thanks again.

ianlancetaylor on 21 Sep 2020

I expected that this would be slices.Uniq once generics happen.

rsc on 23 Sep 2020

👍2

For my use cases, I've found it's far more common to start with unsorted data. Hence it may be faster to deduplicate via a map since sorting is _O(n.log(n))_ - provided the extra temporary memory use is acceptable.

Providing this functionality via the sort package would also encourage using a less efficient method, and/or risks people being surprised when unsorted data is not correctly de-duplicated.

Waiting for generics seems preferable.

mpx on 24 Sep 2020

I think that finding duplicates is a part that should be presented in the language library, but implementation using generics is good. So I close it. Thanks for the comments!

rokkerruslan on 24 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings