EDIT: the proposal has changed later in this thread. See https://github.com/golang/go/issues/16721#issuecomment-241085266 and https://github.com/golang/go/issues/16721#issuecomment-241221703
I propose we make sorting slices a first class concept in Go 1.8.
I think everybody appreciates the cute sort.Interface
type at this point, but the reality is that:
Less
function inline with my code, in context.I think we should add to package sort at least:
package sort
// Slice sorts the provided slice using the function less.
// The sort is not stable.
// If slice is not a slice, Slice panics.
func Slice(slice interface{}, less func(i, j int) bool)
And maybe at the same time, or in the future:
// SliceSorter returns an sorter for the provided slice, suitable
// for with Reverse or Stable.
// If slice is not a slice, SliceSorter panics.
func SliceSorter(slice interface{}, less func(i, j int) bool) Interface
The assumption is that the user's less
function would close over the data to be sorted.
For speed, this would not be purely reflect-based (except for one call to to get the slice length). The implementation would generate a custom swapper as needed for the type of the slice elements such that it's compatible with the GC & write barriers. I did a prototype of this which shows that it's as fast as the typical way.
I think there's plenty of evidence in the Github BigQuery dataset that we're making sort way too hard for users.
/cc @griesemer @robpike @ianlancetaylor @adg @broady
I'd be ok with something like this. It's pragmatic, small, and probably would reduce a lot of sort-related boilerplate.
I think there's plenty of evidence in the Github BigQuery dataset that we're making sort way too hard to users.
I don't have access to the mentioned above dataset, but let me guess: The need-to-be-sorted type declarations and its associated methods declarations constitutes less than 1‰ of code in that corpus. Perhaps far less. If that is true then I don't think this proposal is justified. Also, using reflection in sorting, however small it is, feels to me like encouraging bad practices.
@cznic, you do have access: https://github.com/blog/2201-making-open-source-data-more-available
We'll have to disagree on the metrics at which this is subjectively worthwhile.
No opinion on this.
Only thing I would say is that, sorting a slice is not exactly to be seen as simply sorting an array.
Reason being that if a slice has multiple subslices, sorting one slice will somehow change the others (in a deep value kind of way).
Or should we decide that sorting a slice returns a completely new slice ? (and thus allocate ?)
Just a data point. (I guess you discussed a similar issue when adding the append function but, just to make sure).
As the primary force behind golang at our organization, this and no generic max/min function are the two things I find embarrassingly hard to explain. I'm not sure what it's like at Shoreline Blvd but without fail when an engineer joins our team their mouth drops through the floor at their first sort. Subjectively, I would love this in 1.8.
I'm fine with adding something to the sort package, although of course func (i, j int) bool
is not the function signature that most people expect.
@ianlancetaylor, the benefit of i, j int
is that it lets you get at your []T
's T
or *T
, if the values are too large to efficiently copy. And it fits with the existing Interface.Less
signature. Plus it's not like we can do better, can we?
Oh, sure, I understand why you suggested it. I'm just commenting that it will still surprise people.
As much as this is complained about and as many slice types as I have sorted this has never bothered me, so I don't see the point. Boilerplate is never fun, but this pittance is far below the threshold of annoyance.
A go generate command to spit out Len and Swap, or even just an editor macro, seems more inline with the Tao of Go here.
I'm with @jimmyfrasche here. Why can't this be a go generate
able tool?
If it's basically as fast as sort.Inferface, and is just an addition to the stdlib, not a language change.... why would anyone ever say no to this? Yes, it's annoying that it has to use interface{}
that we try not to encourage people to use... but what other choice do we have? No one has come up with a better suggestion so far.
Put me down for a +1.
How would SliceSorter
work? Wouldn't it have to define a new type, and return it? That's not possible in Go afaik.
I'd prefer usage of a []interface{}
parameter (which would make the reflection unnecessary), but since that doesn't intuitively work on all slice types I know we're stuck with the proposed signature. +1 for saving the headaches anyway though.
To the go generate
comments: It's still possible to use that method, this is only adding to the stdlib. But for newbies and rapid development use, go generate
can be just as much of a pain...
How would SliceSorter work?
It would be a private type implementing sort.Interface
. It would return Interface
, per the example above.
@pbnjay I don't think []interface{} would be very convenient; it's not possible to go from e.g. []*MyThing -> []interface{}
without allocating a new slice and copying every element (which would be way more annoying than writing out the Less/Len/Swap).
It's a bit unfortunate that it has to be func(i, j int) bool
though, because frequently you'd have MyThing.Less(other *MyThing) bool, which means that you'd end up doing:
sort.Slice(mySlice, func(i, j int) bool { return mySlice[i].Less(mySlice[j]) })
But oh well :). I'd still take that over declaring a new type+3 methods just to be able to sort stuff.
@riannucci yup - that's what i meant by "doesn't intuitively work on all slice types" - maybe in Go 2 we can get a generic slice-interface type!
@riannucci, in my experience, the Less
method doesn't already exist somewhere. This is for ad-hoc sorts where you want to write the Less
function inline, in context with your existing code:
sort.Slice(people, func(i, j int) bool { return people[i].Name < people[j].Name })
How about func Slice(slice interface{}, l int, less func(i, j int) bool)
, where l
is the length up to which the slice is sorted. This means the common usage would just be calling Slice
with len([]T)
, but saves the reflect. I'd wager it's possibly not worth it?
Edit: Nevermind, my bad, there will still be reflection.
@freeformz, you already have to do the reflect a bit anyway, so it doesn't save you anything to pass the length explicitly.
Following on from @jimmyfrasche's comment (but my follow up may indeed be the subject of another issue/a separate go-nuts thread)
A go generate command to spit out Len and Swap, or even just an editor macro, seems more inline with the Tao of Go here.
On the go generate
suggestion.
This raises the question of whether to distribute core, oft-used programs that are go generate
-ers.
As things stand, every go generate
-er needs to be retrieved via go get
(or some other means). If instead some core go generate
-ers were distributed as part of a release then:
go get
is required to do core things (like slice sorting)go generate
There are of course some downsides here:
PATH
clashes with existing executable programs, we might need rather verbose names like goGenSort
)@mackross +1 for the missing max
/min
but Go also lacks an abs
olute value function for integers, which is also hard to explain.
Shouldn’t there also be a stable counterpart? Maybe even with a variadic signature like
func SliceStable(slice interface{}, less ...func(i, j int) bool)
so that one could do
SliceStable(people, byName, byAge)
to sort people
first by name and else (when they have the same name/are equivalent) by age?
This only simplifies a trivial unimportant amount of code so I don't see it as more than, at best, a minor annoyance, regardless of frequency.
I'm sure Brad's implementation is a negligible hit on performance compared to the current way, but it would have to be used in function bodies to close over the slice like:
func something() {
//...
sort.Slice(things, func(i, j int) bool) {
//...
})
//...
}
so you have to pay the cost of allocating the closure and building the generic sorter on each invocation of something
, when you don't have to pay anything defining the methods yourself, other than the usual indirections interfaces involve. That's going to be negligible to the cost of sorting anything for all but the smallest of n, and, if it shows up in profiling, it's simple enough to rewrite it the old way, but this sort of code is generally discouraged by the stdlib.
If this is going for pure convenience, I'd prefer the version @ianlancetaylor hinted at where the comparator is an interface containing a func(a, b T) bool
(though the runtime hit from that would certainly be less negligible, even modulo potential copying overhead, and it's even further afield of encouraged practice).
Maybe I'm just uncomfortable that this is at a weird level between convenient and practical that results in this awkward situation where you have to close over a slice that's being permuted under the covers.
All that said, if this goes in, I'll use it—I'm lazy and hypocritical!
I don't have any problem with this existing—it's the inclusion in the stdlib that gives me pause. I have absolutely zero reservations about this being a go gettable library: people could even start using it today. The excellent go4.org or even golang.org/x seem like fine places to put this, and if it gets used a lot without issues moving it to the stdlib would be fine with me.
Taking a key func that's allowed to return one of a list of types ((u)int, []byte, string) makes it look a bit more like sort functions in other languages that take a key func, and would let you later drop in faster sorts specific to the key types without adding APIs. Lots of downsides make that unpalatable (keyFunc'd be an interface{}
in the signature, trickier internally, overhead of Less calling keyFunc twice, and can't write a custom Less to sort by two fields or whatever). Maybe someone sees a better variation of that idea that I don't :) so throwing it out there anyhow.
@jimmyfrasche,
so you have to pay the cost of allocating the closure
Even today with sort.Sort you have to pay the cost of putting a slice into an interface value, since a slice (1 pointer & two ints) is not 0 or 1 pointers (which is all that goes into an interface allocation-free). So it's not significantly different.
I'd prefer to keep this issue focused on the proposal and not the implementation, though.
When I read the title, I thought you were going to propose something like a built-in:
var s []t
sort(s, func(a, b t) bool {
return a < b
})
But this is certainly more pragmatic. Saying that 1% of programs touch sort isn't a very good argument against including this. This is scoped to the sort package, so programs that aren't sorting won't care about it. Even more reason to include it.
I'd prefer to tolerate something more essential and less ugly bloating the standard packages.
Can we teach compiler to handle both cases:
var s []int
sort(s, func(a, b int) bool { return a < b; })
var s []GiantType
sort(s, func(a, b *GiantType) bool { return a.prio < b.prio; })
Also not sure if it's worth worrying about copy mechanics. Perhaps compiler could elide copies somehow? Inline the closure? If it's a pure function, why not. And another point is that if you worry about copying, sort does a lot of swaps anyway, so maybe change data structure.
But anyways, I don't like the idea of having indices if we can do generics in the compiler itself. Most built-in functions act as generic functions.
Sorry for semicolons, I do a lot of javascript lately ("semicolon free" language).
Another thought: recognize simple known types (strings, ints, floats) and use default comparison operator (less than):
var s []int
sort(s)
That's kind of obvious though.
@bradfitz I mentioned that, but you're right that I should have edited that portion out. I was trying to convey that the performance hit, while negligible, would have to be paid more often by design, but I ended up hemming and hawing to the point of incoherence. I apologize. It's not a terribly important concern anyway.
Biggest issue:
Why should this be in the stdlib? You could throw it up on go4 right now and move it to the stdlib later. People who want to use it could start using it today.
@jimmyfrasche Something like it has been at https://github.com/bradfitz/slice for a while.
I really like this idea. I'm also one of those that have almost never written anything different for 2 of the 3 sort interface methods for slices. And also have found it a bit embarrassing having to explain to new Go devs how much boilerplate has to be written to sort a slice. Usually they then ask why they can't just call a sort function with a single key function, like in other languages.
I agree that because it's a localized addition to the sort package that only makes doing a thing more straightforward, that there is 100% benefit here. People can still use the existing approach or use the new shorter approach when it makes sense. Let's fix warts.
@twotwotwo, I don't recommend using github.com/bradfitz/slice. I never updated it to be compatible (safely) with Go 1.5's GC. But yes, essentially something like that. I should update it and move it to go4.org. Still, I believe this is common enough to warrant being in the standard library and would improve the readability of code.
@nsf, let's stay on topic. Nobody (except perhaps you) is proposing language changes. This bug is about a library change proposal only. If you want to discuss a hypothetical language change, the best place is probably golang-nuts@, since it's out of scope for Go 1.
@bradfitz How is it offtopic? I was implying that this is a bad idea and you guys should add a built-in function instead. It's a pretty good place where custom code generation is justified. And saying that a language change is out of scope for Go 1 is not serious, there were a number of language changes in 1.x releases (1.2 added three-index slices for example). Adding a built-in function won't break anyone's code.
@nsf, we can discuss on golang-nuts if you'd like.
Shouldn't the values in the slice implement a comparison method?
The signature looks a bit esoteric in the initial example.
If you need another data point I second what @nsf said.
If you want to add real first-class support to Go for sorting, create a builtin how append
works so that we don't have to close over our slice that we already passed, to the same function, and access it again by index.
This proposal is better than the current sort interface but not compelling enough to make me rewrite all of our sorters to use this new func.
I'm in favor of this proposal -- it's a focused, pragmatic improvement to a common use case.
The way that the "less" function closes over the slice and takes indexes as arguments is similar to how an existing function in sort already works: Search.
In my experience, sort doesn't show up all that often, and it's easy to use as is. So I am not fussed about it.
As far as the proposal goes, it bothers me to add another function signature that takes interface{} to the standard library. It allows people to argue that we like them, and force people to use them and create them in their own work.
I do agree that the proposed function will be easy to use, although I'd like to see the implementation and its performance before deciding whether its benefits outweigh its costs.
@bradfitz,thank you for putting this together and thoughtfully answering our questions. If you don't mind, I have a few more to ask:
For speed, this would not be purely reflect-based (except for one call to to get the slice length). The implementation would generate a custom swapper as needed for the type of the slice elements such that it's compatible with the GC & write barriers. I did a prototype of this which shows that it's as fast as the typical way.
- You propose mutating a user-provided
interface{}
function parameter that represents some slice. Unless I am mistaking, this would become the first and only place the standard library mutates a bareinterface{}
value. Is that correct, and does that matter? Reflecting on adata interface{}
for reading/describing is well-defined in the standard library, however.
I am on the fence about the proposal for concerns of hand-wavy standard library purity but not doctrinally inflexible — FWIW. What extra burdens of proof (e.g., documented behavioral contract) would mutating data interface{}
require?
reflect.Kind
?You propose mutating a user-provided interface{} function parameter that represents some slice. Unless I am mistaking, this would become the first and only place the standard library mutates a bare interface{} value.
False: https://golang.org/pkg/encoding/json/#Unmarshal
What is the intended behavior if a user passes in a type backed by a non-slice reflect.Kind?
It's documented in the initial comment in this post.
Thank you. json.Unmarshal
was so blatantly obvious it was under my nose. :-)
A beauty of the existing implementation is the sort method doesn't have any understanding or care as to what 'thing' it is sorting as long as it implements the sort Interface, so I hope you keep it :). I also think that the proposal here, will be easier to implement for many, so I hope we get it, too.
I love that someone on the go team is giving the sorting boiler plate some attention, but I do not look forward to more code depending on the empty interface. One of the main selling points of go for me is type safety, please don't go full python.
@rajder, there could at least be a perfectly accurate go vet
check for this. It's something.
@bradfitz Oh, that would be nice. A build error is still better imho :)
I'm apprehensive towards. Because the typical use case for the SliceSorter
requires a closure, I think how to use the function will not be readily obvious to those unfamiliar with the pattern. I would be comfortable with the change if it case with a strong and simply example in the GoDoc. Were it to be without an associated example, I would not favor the change.
Go was my first language, and the language which taught me closures. I'm imagining other students coming in to the community from a Java or C mindset and being confused. I realize catering to these people is a non-goal, but most Gophers agree one of Go's strength is it's simplicity of design patterns.
I agree with all of @bradfitz's reasons for it's inclusion, however.
@RobbieMcKinstry I think closures are an important enough feature of the language that Gophers can be expected to know it. See net/http's HandleFunc for example. I do agree that this function should have an example, though. Ideally, many more things in the Go standard library would have examples.
Not sure this is the place for this discussion, but would more examples for standard library code be accepted in the general case? Or is the documentation considered "complete"?
@RobbieMcKinstry #16360 is the place.
@bradfitz So long as we're using reflection, why not go all the way and make the comparison function an interface as well? So you'd have a Go type of:
func Slice(slice, less interface{})
and an "effective" type of:
func Slice(slice []T, less func(T, T) bool)
You're doing reflection already, so it doesn't seem like much of an addition to verify everything at runtime. I suppose the one concern might be whether you could make it as performant? I'd have to think more about that one...
As for the issue of copying large values, I agree with previous comments that simply having less
take pointers (i.e., func(*T, *T) bool
) would solve the problem and would have two benefits: a) you'd avoid the clunky closure and, b) you could reuse a comparison function over and over again (including comparison functions exported for consumers of your package).
I have a sample implementation that I made a while ago that does this, albeit without the performance optimization of doing up-front per-type function creation (it just does the expensive reflection every time and then forgets the results): https://gist.github.com/joshlf/61e711195709427058ae0295b21ffba8
Go doesn't support dynamic type or generics yet. T doesn't provide any type information for current go. So reflect.TypeOf(sort.Slice)
can't return consistent type. If go doesn't provide generics, T must be consistent type. BTW, I implemented similar package few month ago.
https://github.com/mattn/sorter
package main
import (
"fmt"
"sort"
"github.com/mattn/sorter"
)
func main() {
s := []string{
"2",
"3",
"1",
}
sort.Sort(sorter.NewWrapperWith(
s,
func(i, j int) bool {
return s[i] < s[j]
},
))
fmt.Println(s)
}
Having the func takes T as argument will incur a lot of copying
for large Ts. I think that's one of the reason it's using indices.
https://github.com/anacrolix/missinggo/blob/master/slices/sort.go#L10
An example of use of this style in the real world: https://github.com/anacrolix/torrent/blob/master/client.go#L124
@mattn - What do you mean by "consistent"? I'm not sure I understand the problem you're explaining.
@minux - I agree, but I think having a *T
argument is better - it involves as little copying as an index, and is much cleaner and means that the comparison function doesn't have to be aware of the data ahead of time.
@anacrolix - Yep, although you'd need to verify that the types matched up first (e.g., that the first argument is a slice type, that the second is a function, that it takes two arguments, etc).
To flesh out the go generate
idea a bit more I put together sortGen
(I'm sure others have done similar things before me, nothing immediately turned up in my searches):
https://github.com/myitcv/sorter/tree/master/cmd/sortGen
Simply define an order function:
func orderByName(persons []person, i, j int) sorter.Order {
return persons[i].name < persons[j].name
}
then run go generate
which will generate a function with the signature:
func sortByName(persons []person)
The example is more complete and gives details on the rules etc.
sortGen
would fall into the category of core, oft-used generators that I was trying to define earlier that could be distributed with Go, thereby obviating the need for an additional go get
I also dislike that we need top-level types to use sorting. I like the suggestion to eliminate this. Like @robpike I don't think that it happens often enough that I'm terribly concerned about the pure typing effort.
I dislike this particular proposal, because a) it suggests that using interface{} to circumvent the type system is a good way to implement generic functions, which will lead to people doing it much more than before and b) it uses "magic" that is approximately only available to the stdlib to do that, so it excludes people wanting to imitate this methodology. I'd consider sort
to be a "normal" stdlib package (as opposed to, e.g. syscall
, runtime
or os/exec
), one that shouldn't use tricks to work with the runtime and should be possible to write as a stand-alone package. I don't thinks that's feasible with this proposal, because as you pointed out to get similar performance you need to cooperate with the GC, which is only really feasible for stdlib packages that are released in lockstep with go versions.
In short: People will complain about the go authors wanting generics themselves and cheating the type system, but not allowing them to do the same.
As an alternative, it could be considered to provide this in sort
:
type LocalSorter struct {
Len int
Swap func(i, j int)
Less func(i, j int) bool
}
func (s LocalSorter) Len() int { return s.Len }
func (s LocalSorter) Swap(i, j int) { l.Swap(i, j) }
func (s LocalSorter) Less(i, j int) bool { return l.Less(i, j) }
(the type name is… arbitrary. Couldn't come up with a good one in 15s). It's trivial, doesn't eliminate the same amount of duplication and it can be easily provided by an external library too, but it allows using sort
for custom slices without needing package-scoped types at least, thus partially addressing the motivation for this proposal (and the part I, personally at least, care about).
If go goes the way of this proposal, I'd like to see the kind of magic it employs to speed up the interface{}
workaround so that third parties can use it too (as I don't know how that magic looks, I don't know if there'd be a sensible API for that).
@Merovius I think I like this idea as at least a stop-gap. Trivial implementation is at https://play.golang.org/p/4_Z2G1Wd0d . It could be extended so that a sorter wrapper is generated once and reused with different Less implementations.
@Merovius I agree with @jonbodner, and I'd add that we already do this pretty consistently in the stdlib in encoding/decoding situations (json/gob/etc) - we do some expensive up-front work the first time a given type or certain set of arguments is used, and that generates data structures or functions that allow us to perform the same operation much more quickly for every subsequent call. We would presumably do something similar here.
Also, you mentioned "magic" - do you just mean using the reflect package, or do you think this would require explicit runtime support or something else truly "magical"?
@bradfitz Oh right, there will still be some reflection going on anyway since interface{}
.
@joshlf see the original comment by @bradfitz:
For speed, this would not be purely reflect-based (except for one call to to get the slice length). The implementation would generate a custom swapper as needed for the type of the slice elements such that it's compatible with the GC & write barriers.
That last part is not easily accessible to outside people; it requires go-version specific code to stay "compatible with the GC & write barriers", I assume.
@Merovius I'll admit that I don't understand that statement well. I had assumed he simply meant that it would be a normal, well-behaved function that would play nicely with the compiler's emitted GC and write barriers? I take it I'm misunderstanding something?
I'd prefer to not get into an implementation discussion in this bug, but I sympathize with @Merovius's excellent comment. I do agree that sort
should be a normal package that anybody can implement.
In my latest prototype last night, I added a method to reflect:
package reflect
// Swapper returns a function which swaps the slice elements in v.
// scratch must be a non-nil pointer to a scratch element.
// Use reflect.New(v.Type().Elem()).
// Swapper panics if v is not a slice, or if scratch is the wrong type.
func (v Value) Swapper(scratch Value) func(i, j int) {
....
}
Using that, no special code or knowledge is required in the sort
package. (It has to return a swapper func instead of being Swap(i, j int)
to amortize all the expensive reflect safety checks. And using the reflect
API as-is is like 100x slower.)
It's currently just a bit slower, but my implementation of Swapper
isn't as fast as it can be yet.
@bradfitz Any chance you could paste the code? I'm trying to come up with something myself, and I'd be curious to see what you had to do under the hood to get that to work.
Why do you need the scratch
argument? Seems like Value.Swapper
can allocate it itself and return a closure that refers to it.
@bradfitz would there be many uses for Value.Swapper aside from implementing this proposal? I can't think of any that aren't contrived. It feels a little weird to add something so specialized to reflect to achieve a specific goal elsewhere.
Is it possible to add more general mechanisms to the runtime and/or unsafe and/or reflect packages that would allow creating efficient, safe swappers that stay "compatible with the GC & write barriers"?
@ianlancetaylor, true, that's residual from an earlier version that didn't return a closure.
OK, so I made a prototype implementation that accepts a comparison function whose arguments are pointers to the slice elements, not indices. Here's the implementation:
The signature is:
// Sort sorts the slice using less as the comparison
// function. slice must be a slice type, and less must
// be a function which takes two arguments - both poiners
// to slice's element type - and returns a bool.
// Effectively, the type signature is:
// Sort(slice []T, less func(*T, *T) bool)
func Sort(slice, less interface{})
The performance is actually decent - using this Sort
function takes ~170% as long as sort.Sort
in the stdlib (see sort_test.go
linked above). Here are the exact measurements:
BenchmarkSort-4 100000 13367 ns/op
BenchmarkNativeSort-4 200000 7612 ns/op
BenchmarkSortPerOp-4 5000000 374 ns/op
BenchmarkNativeSortPerOp-4 10000000 250 ns/op
Under the hood, the implementation does two key things: First, it generates and caches sort implementations by type (similar to, for example, encoding/json
). Second, it uses unsafe
to avoid expensive reflection logic. I'm not too proficient with the details of how to do this reliably, so there are probably issues, but I'm sure someone who's better at this could modify the implementation so that it is guaranteed to work properly (e.g., guarantee that the GC won't be confused by what we're doing).
Thoughts?
I really wanted to keep this thread focused on the merits and API signature and not implementation details.
But I'll say that func Sort(interface{}, interface{})
is a pretty terrible function signature (the grossness of interface{}
grows super-linearly with the number of interface{}
arguments) and ~170% as long seems too slow. I was already unhappy with ~115%.
Ah sorry, I didn't realize that.
Re: API - what's the thinking behind saying that two interface{}
arguments are more than twice as bad as one (or, for that matter, worse at all)? We already have to do dynamic checking, so I don't see more dynamic checking as much worse than less - we've already thrown static compile-time guarantees out the window.
As for the performance, I was kind of assuming that people more skilled than I am and with better understanding of Go internals would be able to make it much faster :)
@joshlf
What do you mean by "consistent"? I'm not sure I understand the problem you're explaining.
func Slice(slice []T, less func(T, T) bool)
Go compiler doesn't recognize whether the first T is same as second or thrid T.
@mattn I mean with respect to the implementation, which does some really unsafe stuff under the hood (check out the source) such as treating the slice as a byte slice, treating the given function as a func(a, b *unsafe.Pointer) bool
(which it isn't), etc.
Summary of problem
Summary of proposals
1: original proposal (@bradfitz)
2: have something generate the boilerplate, go generate or an editor macro (me)
3: Take two interface arguments []T and func(a, b T) bool (T could be a pointer) (@ianlancetaylor indirectly/me)
4: Take two interface arguments []T and func(a, b S) bool (If T is a pointer S=T, otherwise S=*T)
5: add a new builtin (@nsf)
6: LocalSorter struct (@Merovius)
7: Add a reflect method that creates an efficient swapper for reflect.Value containing a slice in order to implement (1) (@bradfitz)
5 is too magic.
1(+7), 2, and 6 solve the problem and check all the constraint boxes, all usable separately. 6+7 could implement 1 trivially.
3 and 4 fail on performance, but could easily be a go gettable package. Both would be enhanced by (7).
(let me know if I got anything wrong and I'll correct it)
edits:
@joshlf, see https://golang.org/cl/27321 for the prototype with reflect.Value.Swapper. One of the three typedmemmove calls could be eliminated with not much work, so it should get a bit faster yet. I'll look into that tomorrow.
Re: @jimmyfrasche's question of whether there are uses for reflect.Swapper
other than this proposal, I would use it (here) to slap similar interfaces on in-place radix sorts (instead of less
you pass key func (i int)
returning string
or whatever key type the particular sort wants). You could also use Swapper
to provide a random shuffle/sample or a sort.SliceStable
. None of those are world-changing uses, maybe, but they're uses. ¯_(ツ)_/¯
CL https://golang.org/cl/27321 mentions this issue.
Do we have any sort of proposal for generic-but-homogenous slices?
That would solve the problem of having to copy a slice of (non-interface) type T to make a generic slice-of-type-interface, because the generic array would simply be referenced by a fat pointer, the same way that generic scalars are. (Or maybe we have an invisible proxy object to carry those.)
Making up a type notation for the purpose, where []...
is the type of a homogenous generic slice, I would imagine then sort
could be defined with a function signature something like:
lang=go
func sort(sslice []..., less func (interface{},interface{}) bool) (rslice []...)
@praetergeek, as discussed above, language changes are out of scope for this bug and Go 1. This bug is about a library addition only.
@bradfitz Should it be reflect.NewSwapper(v Value) func(i, j int)
? It's taking a reflect.Value, but it's creating a swapper. Even if it's not changed to a function, the name should change, at least. /bikeshedding (Should I make that comment on the CL/learn how to comment on gerrit?)
@twotwotwo I don't know if _another_ sort package counts as a separate use case :þ.
A generic permuter was the only other thing I could think of as well. If (New)Swapper goes in, it would be nice to add that to math/rand: save allocating an []int for shuffle.
There are a lot of algorithms that involve swaps, certainly, but I can't really think of any that _just_ use swaps, so there's going to be a lot of func doThing(slice interface{}, f func(various indicies) retType)
if they do take advantage of (New)Swapper, so this API will proliferate.
@jimmyfrasche - thanks for summarising the various pros/cons above.
Might I suggest we subdivide point 2, because the pros/cons for go generate are quite different to an editor macro (for example)
As far as a go generate approach is concerned (in light of sortGen):
2.1: go generate
go generate
prior to go test
etc is an additional part of development workflowOn the second con, I would argue it's a good thing to be encouraging people towards go generate (hence it's not a con to my mind), but I've separated that opinion from the point itself because I would guess most people don't (currently) use go generate.
To add to @nsf initial idea and after some discussion on r/golang ... if we could add an orderable refinement in go/types on primitive types, that would allow to define multiple relations of order on composite types.
The example that was given there that I found interesting is :
sort.Slice(people, func(p Person) string { return p.Name })
sort.Slice(people, func(p Person) (int,string) { return p.Age, p.Name })
It could probably be simplified further but that would allow to sort composite types by field.
Here, for instance, the second example would sort twice, by age first and then by name.
I haven't thought about what difficulties one may encounter in implementing this but I find the idea quite appealing.
You'll note that: people
is of type []Person
Just another idea that could be easier to use and explain than an indice based solution, if implementable without risks.
@atdiar, this bug isn't about changing the whole feel of the sort package. I just want to make it easier to write Less functions, not change how sorting is done entirely.
@bradfitz I understand but I'm still proposing an alternative that would end up being similar to what you are proposing. I understand the difficulty about language changes, but if the current sort is really a problem, perhaps it would make sense to solve it completely.
Added "no interface{}, no use of reflect or unsafe packages" to 2 and 6.
Optimizations is an interesting point, but I don't want to add that unless it actually happens, because you can say theoretically the compiler could optimize this or that for a lot of things that will never be true in any language. If someone can confirm that it in fact does, I'll add it to 2 and/or 6.
Separating go generate from an editor macro is just editorializing for go generate, imo: everything that applies to one applies to the other. I mean yes it is great and good, but I am trying to keep that as neutral as it is possible for me in the interest of discussion.
@atdiar, do you want to open a new bug so we can track your alternative proposal separately?
Then we can have a meta bug titled "things about sort" with references to each.
@jimmyfrasche A suggested addition to your summary:
@bradfitz it's not really a bug. It's just an alternate suggestion to the current proposal. The rest is up to the people.
I don't know if another sort package counts as a separate use case :þ.
@jimmyfrasche Eh, separate or not it's still a reason to expose Swapper
. People have reasons to write stuff besides the stdlib qSort. (In my case, the reason's an up to 5x speedup.) It's nice if those packages can do what sort
does, rather than sort
having magic that might break with next Go release if I try copying it. There is non-sort stuff you can do that permutes arrays: maintain a heap priority queue more nicely than container/heap
. Merge arrays. Get median or percentiles a bit faster than if you sort the whole thing.
I don't think you have to prove NewSwapper
would get a ton of use, though; it seems like the right way to do SortSlice
anyway. It keeps runtime magic in reflect
, which already has a lot of it, rather than scattering it around the stdlib. It keeps to a minimum how much of the stdlib is available-to-Go-team-but-not-to-you magic. It isn't an especially nasty API, isn't unsafe, doesn't expose a version-specific runtime knob. I know there's a high bar for adding code to std, but if this'll be written anyhow, in this particular case it doesn't seem bad to expose.
@joshlf was on the fence about adding that since it's kind of just how it would have to work but decided to go for it
Also, added it to @Merovius's proposal as well since it applies equally—and noted that that proposal couldn't use Len/Less/Swap as the field names and have the methods to implement Interface.
FWIW, I whipped my "proposal" up as a tiny package for my own use and my own personal needs are 100% satisfied with this. This basically invalidates the last two motivations given by @bradfitz for my personal taste, as IMHO every project/organization/repo/whatever can just do this themselves if they need or someone can just provide one for general use (perhaps under github.com/pkg). All the usual arguments for living in the stdlib or the x/ repos (wide availability, stability, maintenance…) just don't apply for something this trivial, in my opinion.
This leaves the repetitiveness of Len and Swap. To that, I personally say ¯_(ツ)_/¯ and it doesn't justify the downsides of interface{}
in this instance. And I also believe that the alternatives proposed here (that don't require a language change) just increase the downsides, as they require even more reflection and ignoring the type system and will thus make it even harder to explain to people why go hasn't generics but they also shouldn't roll their own.
With that I'll just trust the Powers That Be to do the Right Thing™ and will unsubscribe from this issue to not be tempted to add even more noise :)
@Merovius I like that method! I packaged it up neatly: https://github.com/davelondon/sorter
type person struct{ name string }
s := []person{{name: "foo"}, {name: "bar"}, {name: "baz"}, {name: "qux"}}
sort.Sort(sorter.New(
len(s),
func(i, j int) { s[i], s[j] = s[j], s[i] },
func(i, j int) bool { return s[i].name < s[j].name },
))
fmt.Println(s)
// Output: [{bar} {baz} {foo} {qux}]
@Merovius, @gri and I both wrote that same thing. I thought it was still too tedious.
I even referenced it in my CL above (see the first and currently only comment at https://go-review.googlesource.com/#/c/27321/2/src/sort/sort.go@28)
... but if that and maybe reflect.Value.Swapper is all that comes out of this, I suppose that's better than nothing.
@twotwotwo those are all excellent points, and thank you for the additional examples.
My major concern is that all these mountains are being moved to avoid
type sorter []T
func (s sorter) Len() int { return len(s) }
func (s sorter) Swap(i, j int) { s[j], s[i] = s[i], s[j] }
which would allow everything you listed without having to use the reflect package at all.
Sure, it's inaesthetic and a bit of chore but it's not the worst thing. The proposal is nicer, but, while nicer is nicer, it seems like an awfully complicated solution for such a small problem.
If we're going to do this, then I'm all for exposing the magic so others can use it and the sort package itself remains unmagical.
My concerns with NewSwapper are/were:
Looking at the CL, the alternative would be exposing runtime memmove operations (presumably to unsafe) which is probably more evil than it's worth.
Implementation-wise, at least, NewSwapper has to be in reflect and certainly in the stdlib (barring exposing memmove or adding a new package just for this under reflect and factoring some private helpers in reflect to a shared internal package).
So, still a little uneasy about it on principle, but it seems to be the best solution so far, if we're going to do this.
@jimmyfrasche @twotwotwo To clarify something, I think that part of the pain of having to implement custom types is having to clutter up the package-global namespace with them. This probably isn't the place for this discussion, but in my personal experience, I've found myself really wanting to be able to define scope-local types _with methods_ to solve this. It makes the difference between getting to use a throwaway, local name (e.g., sortable
) and having to clutter the package-global namespace with things like sortableWidgets
or whatever. Also, being able to define types inside of functions _and_ have methods on those types would be awesome. It's very possible this has been discussed elsewhere and I'm just not aware of it (@bradfitz might know).
@joshlf the sort.Funcs type and/or @davelondon's implementation of the idea at https://github.com/davelondon/sorter avoid having to create a type. Not as simple to use as the proposed sort.Slice, but simpler and not limited to slices. If that's the only box you want checked, it checks that box.
Language changes should get their own issues. I remember it being discussed a long time ago, at least on the mailing list, but couldn't find anything with a cursory search. At any rate that would be Go2 territory.
@jimmyfrasche My point was simply that if what I mentioned is a significant part of the concern, just adding that to the language would fix it and make the other problems that people have complained about not as bad (since they'd be the only issues left). In my experience, the headache that Brad referred to in the first comment (which I have definitely felt) stems largely from the namespacing issue. Also, I disagree that this should be Go 2 territory. Supporting methods on local types feels more similar in size to something like being able to use v.method
as a closure - it would only expand the set of valid Go programs (thus maintaining the Go 1 compatibility guarantee), wouldn't be a particularly big change to the compiler, and wouldn't fundamentally change how any normal tasks are done in Go.
Anyway, my point is simply that I do think this is relevant here because it's a possible alternative solution (although probably not the best solution proposed so far, I admit).
just adding that to the language
"just"
@bradfitz I know that statement sounds grand in the abstract, but in this case it's a relatively small addition, no? As I said, it feels similar (at least in terms of the experience of using the language) to adding the ability to use v.method
as a closure. I can't speak for how much it would take to implement, of course.
I'm not discussing language changes in this bug. Let's chat on golang-nuts.
@bradfitz OK, sorry to clutter up the discussion. I'll move it there.
@jimmyfrasche
Optimizations is an interesting point, but I don't want to add that unless it actually happens
Consider https://github.com/myitcv/sort_inlining_example
main.go
is largely a copy and paste of un-exported parts of sort/sort.go
. Notice I've replaced use of sort.Interface
with []string
directly.
main_test.go
is a copy of BenchmarkSortString1K
from sort/sort_test.go
run.sh
shows the setup.
README.md
shows the output from run.sh
(on my machine)
Notice the inlining that can happen when the use sort.Interface
is dropped.
This leads to a ~33% speedup in this benchmark.
Separating go generate from an editor macro is just editorializing for go generate, imo: everything that applies to one applies to the other. I mean yes it is great and good, but I am trying to keep that as neutral as it is possible for me in the interest of discussion.
go generate exists and is universally available to everyone via the go tool. The usage of the command is uniform and well documented.
Whilst some editors may have support for macros of this sort, these macros are not universally available (i.e. in all editors) and almost by definition usage of these macros is not uniform (i.e. it's different in each editor)
Hence why these should be considered very separately in my opinion.
To start off, this builds on @Merovius' solution and point of view and tries to justify it, so if this clutters up the discussion for you, stop reading. Also, this is another rant against using reflect
so if that clutters up the discussion, stop reading. This is also a rant against interface{}
and generics in general, so if that clutters up the discussion, stop reading.
There is a precedent for a possible solution to this in my opinion, its sort.Search
read the godoc for details, but the general idea is that we use length and closure instead of doing generic stuff
a sort shorthand would probably look like
func Slice(n int, less func(i, j int) bool, swap func(i, j int))
LocalSorter
just becomes an internal, private type.
an example usage
a := []int{5, 3, 4, 1, 2}
sort.Slice(5, func(i, j int) bool { return a[i] < a[j] }, func(i, j int) { a[i], a[j] = a[j], a[i] })
the fundamental reason I think this works is because this builds upon already existing semantics with sort.Search
, it works for me because sort.Search
works for me, while the proposed signature doesn't because it is basically C++/Java/C# STL sort
std::sort(std::begin(a), std::end(a), [](T a, T b) {
...
})
which only works there because of templates (half assed generics). In the method proposed so far, we circumvent generics using reflect
, which, and this may only be because I have only been doing golang since january this year, seems like a serious antipattern to me, because for me interfaces in go have no business with type, only functionality.
By using reflect to make the round trip of getting a subtype, I think we are propogating the idea that this is how generics work in go, you send a super type and then get back the sub type using reflect
.
Another advantage I can think of is that at least we are giving a choice to the user to generate a Swap
using reflect
or write one line of code.
However, I think generating methods using reflect
opens up a wide array of problems of its own, because today its Sort
, tomorrow it will be Search
, and instead of generating Swap
, we would be generating Less
functions, then Lower Bound, Upper Bound, Min Element, Max Element and so on....
TL;DR: I don't think we can define functions in go with semantics of languages that have generics, because we don't. I think that sort.Search
is one of the best implemented functions in the go standard library, and we should do things in the semantics introduced by that function.
I also think that if and only if we want to introduce a shorthand, we need another function definition, but if we want to only reduce boilerplate code, we should definitely look at go generate
and @myitcv's sortgen, because irrespective of pros and cons, generating boilerplate code that you don't want to maintain is the fundamental reason go generate
exists (in my opinion).
Also, more shorthands can be implemented, we should also implement a shorthand for sort.IsSorted
and I propose that we add Greater
, LessOrEqual
and GreaterOrEqual
to sort.IntSlice
, sort.StringSlice
and sort.FloatSlice
. With them, lower bound in go simply becomes
sort.Search(n, sort.IntSlice(a).GreaterOrEqual)
and upper bound is
sort.Search(n, sort.IntSlice(a).Greater)
@suyash the only thing you said about your sort.Slice function that's wrong is naming the function Slice: since you're writing the swap function yourself it's not limited to slices. It's _more_ generic.
I like the proposal (although Merovius@ proposal is very good, too---I'm definitely going to use it myself), however I've one objection has kind of already been touched on: how do defend this proposal if future similar proposals are rejected?
For better or worse, a fair amount of gophers want min/max/abs/etc functions and I could see future proposals using this as their defense for adding a math.Min(interface{}, interface{}) function.
@myitcv well, yes, if the code generator writes a custom sorter instead of filling in the boilerplate and using the sort package the compiler can optimize it better. I didn't think anyone was going that far. I meant if the compiler was actually doing type specialization to desugar interfaces or something that it couldn't do with the closure or reflect versions. But, again, what you're saying applies equally to an editor macro (albeit few would want to maintain such a large macro even an editor with a nice macro language).
My point is that it doesn't really matter if the code is being generated by go generate or an editor macro or a robot from the future that appears in a shimmering flash, tells you not to worry, touches your monitor, and the code is just there on your drive and then the robot explodes into a fractal of rainbows and joy.
They're all equivalent to "something happens and then the code is just written for you and you don't have to deal with it". The specific mechanism isn't the relevant part.
@suyash I really like this, although we could make it more usable by having the function return a sort.Interface
which could be used with any of the various functions in the sort package:
func MakeInterface(len int, less func(i, j int) bool, swap func(i, j int)) Interface
This would obviously be _super_ simple to implement:
type interfaceWrapper struct {
len int
less func(i, j int) bool
swap func(i, j int)
}
func (iw interfaceWrapper) Len() int { return i.len }
func (iw interfaceWrapper) Less(i, j int) bool { return iw.less(i, j) }
func (iw interfaceWrapper) Swap(i, j int) { iw.swap(i, j) }
@joshlf FWIW: https://github.com/sermodigital/sorter :information_desk_person:
edit: helps if it's public now doesn't it?
@joshlf, yeah, that's what @gri and I were thinking recently. It seemed too tedious before the idea of a "reflect.Swapper
". I think I'd put the swap
before less
, so the less
literal could span multiple lines, and you could write:
sort.Stable(sort.With(len(s), reflect.Swapper(s), func(i, j int) bool {
if s[i].Foo != s[j].Foo {
return s[i].Foo < s[j].Foo
}
return s[i].Bar < s[j].Bar
}))
(MakeInterface
might be a bit long. I like With
or Funcs
, but len(s)
is not a func)
@bradfitz I really like where this proposal is going, maybe allow nil for the swapper and call reflect.Swapper in sort.With if nil?
@rajder, it's tempting, but then sort
would need to depend on reflect
for only that one feature, and it is not supposed to. See comments at https://go-review.googlesource.com/#/c/27321/2/src/go/build/deps_test.go
Also, I'd prefer the type-unsafety (use of reflect) and panic at runtime if the argument isn't a slice to be explicitly opt-in if we go this route. Then the sort
package remains pure.
@bradfitz Ah, I did not think about the added dependency, I completely agree.
Edit: Making reflect and the added risk of panic opt-in is also a very nice idea. I would personally always implement the swapper but it's nice to have options.
A minor thing I think will look funny to someone seeing it for the first time is:
sort.Sort(sort.With(.....
Would not the version of the proposal by @suyash look better, but with a different name? (like @jimmyfrasche pointed out)
sort.Sort2
sort.Stable2
or
sort.Sortfn
sort.Stablefn
or somehting completely different.
@jimmyfrasche
if the code generator writes a custom sorter instead of filling in the boilerplate and using the sort package the compiler can optimize it better
Indeed. But that's the whole point of go generate, you're not constrained. It doesn't make it any less of a "pro" in fact it reinforces it as a powerful approach.
They're all equivalent to "something happens and then the code is just written for you and you don't have to deal with it".
Agreed; because at the end of the day code is generated. In this respect they are equivalent.
The specific mechanism isn't the relevant part.
In the context of the problem we are trying to solve the mechanism is extremely relevant.
To my mind we are trying to solve:
_"how best to remove/alleviate the tedium associated with sorting slices for the Go community"_
I have emphasised what I consider to be the key part of this statement.
Unlike an approach that, for example, adds some magic to the reflect package where a simple code change is required to use it, using either go generate or an editor macro requires the user to do something extra (this is incidentally a "con" shared by both).
Comparing each:
go get
/install the generatorgo generate
as part of my workflowAs I said before, whilst some editors may have support for macros of this sort, these macros are not universally available (i.e. in all editors) and almost by definition usage of these macros is not uniform (i.e. it's different in each editor). Which I think rules out editor macros as a viable means of solving the problem _for the Go community_, because documenting how to use the approach becomes impossible.
Does that make sense?
@rajder sort.With returning a sort.Interface is the most useful, general thing.
Since 90+% of the time it's just being passed directly to Sort a super simple helper like func SortWith(...) { Sort(With(...)) }
(where ... is just me being lazy) is unobjectionable and you could do it yourself in the less common cases.
@jimmyfrasche Yes, I agree that sort.With returning a sort.Interface is more versatile. I was just thinking about the somewhat funny looking sort.Sort(sort.With...
, really a minor thing imho, but still looks a bit awkward, but on the other hand quite easy to remember.
@myitcv I understand your point completely. I agree with you that go generate is a great way to solve the problem and that it should be the preferred mechanism of the community. I'm not changing the summary. The pros and cons I listed are for solving this particular problem with code generation in general and cover more than just go generate and editor macros, which are listed as examples of mechanism. Separating the points in the summary adds nothing to the discussion of _this_ topic. It just adds additional pros and cons specific to various methods of code generation, which can be discussed elsewhere.
@bradfitz I find your modified proposal very agreeable.
@bradfitz looks good, although I'd still like a sort.SortWith
to go along with sort.With
, as it falls more in line with sort.Search
and the original proposal does define two methods, one that generates a sort.Interface
and other that just sorts, but I also get @rajder's point, its a really minor thing and these days you can generate snippets for this kind of stuff in editors. Also, my opinionated implementation
@suyash, @rajder, I'm fine with a SortWith
wrapper, but I don't feel strongly either way. My objection earlier was to naming anything ending in 2
or Ex
or Fn
(an abbreviation Go doesn't use).
Okay, check out https://golang.org/cl/27321 for an implementation of the modified proposal (https://github.com/golang/go/issues/16721#issuecomment-241085266).
It's actually faster than a normal sort, for reasons explained in the CL description.
The CL is not implementing the sort.Slice()
function proposed in this issue. Are you now proposing that one should write sort.Sort(sort.With(len(s), reflect.Swapper(s), func(i, j int) bool { [...] })
instead, or you will be adding a sort.Slice()
wrapper later?
@bradfitz I have not used Gerrit before but it looks to me like you forgot to commit quickSort_func and stable_func in sort.go?
Improved performance sounds awesome considering this issue had other primary goals :)
@rajder quickSort_func and stable_func are in the file src/sort/zfuncversion.go
@rasky, the comment immediately before yours does say "for an implementation of the modified proposal".
I think only the modified proposal would be palatable for everybody in this thread and the Go team, once I realized sort isn't supposed to depend on reflect, per the package layering policy described in go/build/deps_test.go.
I actually no longer like the name sort.With
for the Interface
constructor because it would be too easy to write & for reviewers to miss that this statement by itself:
sort.With(len(s), reflect.Swapper(s), func(i, j int) bool { .... })
... is a no-op, even though it looks like it does something.
So I'm coming around in favor of different names and perhaps a wrapper:
sort.MakeInterface
as proposed above to make the Interface
sort.With
as an alias for sort.Sort(sort.MakeInterface(...))
sort.Stable
. they can write `sort.Stable(sort.MakeInterface(...))``This comment is off topic.
Would it not be neat to have the compiler do the optimization you did manually for quickSort_func and stable_func when the interface is not actually required to be an interface at runtime?
@rajder, let's stay on topic.
Dumb question (can't comment on CL): why does your CL
have a separate implementation instead of using funcs
everywhere and
creating a new funcs
iff the sort.Interface isn't already type funcs
?
On Sat, Aug 20, 2016 at 1:35 PM Brad Fitzpatrick [email protected]
wrote:
@rajder https://github.com/rajder, let's stay on topic.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/16721#issuecomment-241222210, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFnwZ5xLTp0ZKZbw5l_0T_XCLSzOcvQvks5qh2UFgaJpZM4JlCQ5
.
I would have posted these on the CL, but I gave up on trying to figure out how gerrit works. It's like an angry spaceship.
*funcs
, too?Following up on my last comment, here's what I meant: https://go-review.googlesource.com/#/c/27456/
@jimmyfrasche, fixed the panic message. I'm not very concerned about Reverse at the moment. We can do that later easily.
@EricLagergren, commented on your CL.
Patchset 11 of https://golang.org/cl/27321 now has updated names per https://github.com/golang/go/issues/16721#issuecomment-241221703
@griesemer, @robpike, @ianlancetaylor, please review.
Could we avoid the panic in:
// If slice is not a slice, Slice panics.
func Slice(slice interface{}, less func(i, j int) bool)
by introducing a private slice.Interface interface that would be implemented by all slice types thanks to some dummy private function. The signature would become:
// s is statically typechecked to be a slice.
func Slice(s slice.Interface, less func(i, j int) bool)
If you want typesafety, then it looks like:
sort.With(len(s), func(i, j int) { s[i], s[j] = s[j], s[i] }, func(i, j int) bool {
if s[i].Foo != s[j].Foo {
return s[i].Foo < s[j].Foo
}
return s[i].Bar < s[j].Bar
})
That's a strict improvement over what we have now, but still quite a mouthful, so that wouldn't stop all the complaints.
If a slice.Interface
would support Len()
and Swap(i, j int)
, then sort.Slice
could be implemented without reflection or any other magic. "Just" an internal hack to slap slice.Interface
over all slices.
@guichaz, language changes are out of scope and off topic for this thread, and a language change would be required to have something like your proposed slice.Interface
. But your example of sort.With
is indeed how you'd write a sort without any use of reflect, and without writing a new type. I agree that it's an improvement, since it keeps the sort logic in context with the code.
This might be a little off-topic, but one readability improvement might be to add one more type and method to the sort package to make it easier to compose a sort hierarchy:
package sort
type By func(i, j int) bool
// assumes !(a[i] < a[j]) && !(a[j] < a[i]) is equivalent to (a[i] == a[j])
func (a By) ThenBy(b By) By {
return func(i, j int) bool {
return a(i,j) || !a(j,i) && b(i,j)
}
}
It could be used like this:
sort.With(len(s), reflect.Swapper(s),
sort.By(func(i, j int) bool { return s[i].Foo < s[j].Foo })
.ThenBy(func(i, j int) bool { return s[i].Bar < s[j].Bar })
)
@infogulch I really like this idea. I think, though, that it would be a natural extension if/when the existing proposal is adopted (since it assumes that we use func(i, j int) bool
). I'd suggest proposing it then. But I do really like it. It's really clean.
I'm not sure I see the "simplicity" benefit of "ThenBy"; why not simply
write the cascading comparisons into the one callback function?
BTW, in a scalar field, !(a[i] < a[j]) && !(a[j] < a[i]) is equivalent
to (a[i] == a[j]) or !(a[i] != a[j]).
On 26 August 2016 at 02:04, Joe [email protected] wrote:
This might be a little off-topic, but one readability improvement might be
to add one more type and method to the sort package to make it easier to
compose a sort hierarchy:package sort
type By func(i, j int) bool
// assumes !(a[i] < a[j]) && !(a[j] < a[i]) is equivalent to (a[i] != a[j])func (a By) ThenBy(b By) By {
return func(i, j int) bool {
return a(i,j) || !a(j,i) && b(i,j)
}
}It could be used like this:
sort.With(len(s), reflect.Swapper(s),
sort.By(func(i, j int) bool { return s[i].Foo < s[j].Foo })
.ThenBy(func(i, j int) bool { return s[i].Bar < s[j].Bar })
)
@praetergeek I'm not sure what your comment about boolean algebra in scalar fields is trying to convey. I understand that it's not exactly equivalent; that's why I made a comment about it. It should be true almost always, but if not for some case then it's easy to fall back to the other method.
Why? It's shorter (1 line per field vs 3n-2 lines), easy to understand (it says right there: sort by x then by y), _very_ easy to visually parse (they line up until the field name), harder to get wrong (only 2 references to each field vs 4; all the logic is encapsulated in ThenBy), and individual field comparers can be reused if needed.
But as @joshlf recommended I think I'll save it for another proposal later if this one is accepted.
I think @infogulch's idea boils down to introducing a DSL for sorting into the sort-package. I dislike the idea, because it can get out of hand quite quickly and I consider DSLs of this form a bad habit. They hide simple, readable code behind layers of abstractions that need to be unwound.
Also, it will, if used extensively, make code slower. While performance shouldn't be the ultimate goal, I think encouraging fast code by default is a good thing (and as I said, I believe it hurts readability too, though that's subjective).
@Merovius I agree with the DSL point; I hadn't thought of it that way.
Status update on this proposal:
@ianlancetaylor, @griesemer, and @robpike are all neutral. They wanted to wait for @rsc to get back from vacation and chime in.
Because the default rejection answer to standard library addition proposals is "put it elsewhere", I've preemptively put it elsewhere:
https://godoc.org/go4.org/sort
The go4.org/sort
package is now a pure superset of the standard library sort
package.
I figure people can get some experience with the proposal there in the meantime. The reflect-based Swapper
func creator is at https://godoc.org/go4.org/reflectutil and is used by go4.org/sort
.
I still think the standard library should make sorting slices easier and permit writing sort functions inline without declaring a type, though, similar to http.HandlerFunc
.
Ping @rsc.
TL;DR: Can we do just sort.Slice and not all the general stuff below it?
I looked at the go4 code and the pending CL 27321. I agree that it would be good to make sort easier to use. I see a few different parts here:
I would be happy to see sort.Slice in Go 1.8. It's a nice API and will be a significant improvement for the vast majority of sort.Sort uses.
The rest seem unfortunate. Specifically:
Stepping back a bit, I guess the approach here is a few steps:
The alternative is one step: "add sort.Slice". Unless there is evidence that the intermediate steps have compelling uses to clean up existing code other than getting to slices, I'd rather not commit to the details of those steps in public API. sort.Slice this way ends up being a helper like sort.Ints, not a gateway to a whole parallel API.
I would also like to see sort.SliceStable added at the same time. It needs to be just as easy to do a stable sort as an unstable sort. I don't care much one way or the other about sort.SliceIsSorted but I would lean toward including it by analogy with the Ints functions.
I considered suggesting that sort.Slice return a sort.Interface, so that you'd write:
sort.Sort(sort.Slice(x, func(i, j int) bool { return x[i] < x[j] }))
sort.Stable(sort.Slice(x, func(i, j int) bool { return x[i] < x[j] }))
sort.IsSorted(sort.Slice(x, func(i, j int) bool { return x[i] < x[j] }))
but I think that's overkill. sort.Slice is a helper, so it should be helpful. The fact that the comparison takes indexes instead of values means that in basically every call there's going to be a closure, so we should avoid piling on too much other stutter. These are clearer:
sort.Slice(x, func(i, j int) bool { return x[i] < x[j] })
sort.SliceStable(x, func(i, j int) bool { return x[i] < x[j] })
sort.SliceIsSorted(x, func(i, j int) bool { return x[i] < x[j] }))
Just a quick note. Isn't sort.IsSliceSorted
better? Or is it more valuable to have the same prefix for all these related methods?
@mibk From a clean slate, maybe, but we already have IntsAreSorted, Float64sAreSorted, StringsAreSorted, not AreIntsSorted etc.
Exposing reflect.Swapper may be a necessary evil. That's OK. Better to require other Go implementations to implement/port reflect.Swapper than to port unsafe code tucked into package sort.
So you're cool with 4 API additions: reflect.Swapper, sort.Slice{,Stable,IsSorted}?
So you're cool with 4 API additions: reflect.Swapper, sort.Slice{,Stable,IsSorted}?
SGTM
CL https://golang.org/cl/30088 mentions this issue.
@rsc considering that external packages may accept sort.Interface
values, such as container/heap
and presumably a subset of https://godoc.org/sort?importers (including a few of my own packages), I'd like to counter-argue that having no mechanism for creating a sort.Interface will be surprising and disappointing.
Even in the absence of such concerns, I'd like to argue that the proliferation of parallel functions (SliceStable) to be unfortunate -- I appreciate sort.StringSlice
and friends, but see no justifiable benefit to sort.Strings
and friends. Such functions bloat the package, saving users only a handful of keystrokes.
I feel that this may just boil down to a naming problem -- with a clearer name than MakeInterface, the usage of an interface builder may well become obvious.
sorry for bringing this up if this was resolved somewhere else, but any particular reason the sort.Search
like syntax was not preferred?
I think that the fundamental argument of making reflect.Swapper
function public is to allow users the option of using it or not, otherwise it can just be implemented inside the sort package itself.
another point is that getting the length of slice directly would _probably_ be faster than getting it through reflection, though not sure by how much.
also, all the previous discussion about not encouraging generic programming through interface{}
@extemporalgenome It's trivial to write a converter from function triple to interface if someone needs that. That's true already today.
type funcs struct { n int; swap func(i, j int); less func(i, j int) bool }
func (x funcs) Len() int { return x.n }
func (x funcs) Swap(i, j int) { x.swap(i, j) }
func (x funcs) Less(i, j int) bool { return x.less(i, j) }
func FuncInterface(n int, swap func(i, j int), less func(i, j int) bool) Interface {
return &funcs{n, swap, less}
}
If there is evidence that this comes up often in real programs, we can easily add it later. But we'd need to see that evidence first.
@suyash I assume that by sort.Search-like syntax you mean code like
sort.With(len(s), func(i, j int) { s[i], s[j] = s[j], s[i] }, func(i, j int) bool {
if s[i].Foo != s[j].Foo {
return s[i].Foo < s[j].Foo
}
return s[i].Bar < s[j].Bar
})
as in the comment above (Aug 21). The reason not to prefer this is that this entire proposal is about making sort easier to use. The above is not substantially less code than you have to write today. It does avoid the new type, but it is still full of boilerplate.
Compare to:
sort.Slice(s, func(i, j int) bool {
if s[i].Foo != s[j].Foo {
return s[i].Foo < s[j].Foo
}
return s[i].Bar < s[j].Bar
})
The new API is meant to be a helper, so it should be as helpful as possible - with as little boilerplate as possible - without restricting the scope of its help. My guess is that >99% of sorting happens on slices, in which case the generality added by the boilerplate serves no useful purpose. If someone has evidence otherwise, I'd be happy to see it.
CL https://golang.org/cl/30250 mentions this issue.
@rsc lets take an example that comes up fairly often (for me), lets say you want to sort a set of numbers but you also want to track the original positions of the numbers.
So you had a slice like []int{5, 2, 3, 1, 4}
and you sorted it to get []int{1, 2, 3, 4, 5}
but at the same time, you also wanted to track positions, so in a separate slice you would want []int{3, 1, 2, 4, 0}
.
With this addition, you would separately declare a pair
type and a new pair array and copy original array's contents into the new array and initialize positions and then sort them, while defining the appropriate less closure.Something like
a := []int{...}
type pair struct {
first, second int
}
p := make([]pair, len(a))
for i := 0 ; i < len(a) ; i++ {
p[i] = pair{a[i], i}
}
sort.Slice(p, func (i, j int) bool {
return p[i].first < p[j].first || (p[i].first == p[j].first && p[i].second < p[j].second)
})
while if we had the option to define a swapper, we can just define another slice, initialize that to positions (0, 1, 2...) and in the swap, swap in both slices.
something like
a := []int{...}
pos := make([]int, len(a))
for i := 0 ; i < len(a) ; i++ {
pos[i] = i
}
sort.Slice(len(a), func(i, j int) {
a[i], a[j] = a[j], a[i]
pos[i], pos[j] = pos[j], pos[i]
}, func(i, j int) bool {
return a[i] < a[j] || (a[i] == a[j] && pos[i] < pos[j])
})
sort.Search
like semantics is because sort.Search
itself is defined only for slices, but the best part about it is that it can do 6 different operations simply by passing the appropriate closure (https://go-review.googlesource.com/#/c/29131/6/src/sort/example_search_test.go@58). One issue that comes up with doing things differently with sort.Slice is the difference from sort.Search in terms of semantics.Again, the reason reflect.Swapper
came up in the first place was to support the general case, so you could do
sort.Slice(len(a), reflect.Swapper(a), func(i, j int) { ... })
otherwise I don't think there was any need to have it as a public library function.
swapper
snippet that on a tabstop expands to func(i, j int) { a[i], a[j] = a[j], a[i] }
sort.MakeInterface
solves this, but @bradfitz himself did genzfunc.go
for performance, the results of which are in the commit message at https://go-review.googlesource.com/#/c/27321/11. Also in case you didn't see this: https://github.com/nieksand/sortgenerics https://news.ycombinator.com/item?id=12602456@suyash For your example you could use rsc's FuncInterface
function above. It's marginally more typing for you, and no additional API in the standard library. To get this into the standard library there needs to be clear and common use cases.
@suyash What Ian said, but also "text editors can macro-generate your code for you" is a particularly bad rationale for an API decision. Code has to be read and maintained, not just written. Also, if you are OK with text editor macros, one could generate the old, more flexible sort.Interface type+methods for you if you insist.
The time it takes to compute the slice length is not going to show up.
Most helpful comment
In my experience, sort doesn't show up all that often, and it's easy to use as is. So I am not fussed about it.
As far as the proposal goes, it bothers me to add another function signature that takes interface{} to the standard library. It allows people to argue that we like them, and force people to use them and create them in their own work.
I do agree that the proposed function will be easy to use, although I'd like to see the implementation and its performance before deciding whether its benefits outweigh its costs.