I thought my comment https://github.com/JuliaLang/julia/pull/34744#issuecomment-588002666 may be distracting the original issue so I open it as a separate issue. My questions are:
isless
for everything?isless
?I think the answer to point 1 is yes, since I can do isless(a, b) = isless(objectid(a), objectid(b))
. This is stupid, but I think it's OK as a proof of possibility. In particular, this means that, if we define equivalence by mapping to an integer, we get _a_ total ordering "for free."
For point 2, I think there are cases where optimization is trivial and covers a wide range of practical use. For example, we can lower isless(a::T, b::T)
with struct type T
to isless
on the tuples of field values.
If the philosophy behind the compatibility between hasing and sorting is to support consistent behavior of hash-based and comparison-based containers, I think the answer to point 3 is yes too. But I don't think I have thought enough about it so maybe I'm missing something. In particular, it may be impossible to avoid ordering that changes julia
process to process (like how hash
behaves).
Why is it desirable?
I suppose this is required for storing arbitrary key/element in a comparison-based dictionary/set? This is required for sort-based group-by, too.
(Though I guess I can live without this. I may be getting too greedy here.)
I guess a practical "baby-step" request here is to define isless(::T, ::T)
when T
is a struct:
isless(a::T, b::T) where {T} =
if isstructtype(T)
isless(_fieldvalues(a), _fieldvalues(b))
else
throw(MethodError(isless, (a, b)))
end
_fieldvalues(x) = ntuple(i -> getfield(x, i), nfields(x))
Maybe even better definition is to be flexible for parametric types if isstructtype(typeof(a)) && isstructtype(typeof(b)) && typename(typeof(a)) === typename(typeof(b))
instead of if isstructtype(T)
.
Given that hash
is automatically defined for everything, being able to compare with the same type seems to be a reasonable extension.
Maybe even better definition is to be flexible for parametric types
if isstructtype(typeof(a)) && isstructtype(typeof(b)) && typename(typeof(a)) === typename(typeof(b))
instead ofif isstructtype(T)
.
Oh, no, wait. This does not work when there are type parameters that are not determined by the field types.
Not a fan. There are times when absence of an operation helps you think clearly: for example if your calculation results in the answer 3s + 4m
(s = seconds, m = meters) then you know you've made a mistake. If I'm working in a domain where I expect rotational invariance, then the only case in which a collection of points can be placed in a partial order is when they fall on a single line through the origin. If you suddenly give me an arbitrary way to order them anyway, you've made it harder for me to think clearly.
I agree that a lack of definitions helps to find bugs. But I think that's in direct tension to using isless
for containers where you just need _an_ ordering, not the one compatible with the mathematical objects you are modeling with the type system. I think this is more about <
falling back to isless
automatically. Though I guess it means any kind of automatic definition of isless
requires a breaking change in <
...
I guess the better approach would be to write a new dict type which accepts a type or function for ordering its elements. Well, I guess there's such thing already. Defining isless for all pairs explicitely is a bad idea IMO. If you are fine with an arbitrary ordering anyway just convert your objects to an arbitrary totally ordered space (e.g. hashing them, or taking objectid, though the latter changes between runs) and compare them there. That's the cleanest approach since it doesn't do weird default behaviour for incomparable objects.
In other words: Don't introduce arbitrary decisions into fallbacks. Keep the rule to only define sane defaults and error otherwise.
If you are fine with an arbitrary ordering anyway just convert your objects to an arbitrary totally ordered space (e.g. hashing them, or taking objectid, though the latter changes between runs) and compare them there.
I think I'm convinced this is the only reasonable approach at least within 1.x. Traits for querying the existence of isless
and isequal
are required for making this compatible for customized isless
and isequal
, though.
Regarding that I was recently (on slack) thinking about a more generic approach of isless
which also tracks a ComparatorContext{T1, T2} with it. Say, we'd get a 3-arg isless
. Then the following lines could be examples of what would be possible very concisely)
isless(::HashComparator, a, b) = isless(CanonicalComparator(UInt), hash(a), hash(b))
isless(::CanonicalComparator{UInt}, x::UInt, x::UInt) = #some native call
isless(a,b) = isless(CanonicalComparator(typeof(a), typeof(b)), a, b)
isless(::PropagateMissing, a::Missing, b) = missing
isless(::PropagateMissing, a, b::Missing) = missing
isless(x::PropagateMissing, a, b) = isless(CanonicalComparator(typeof(a), typeof(b)), a, b)
<(a,b) = isless(PropagateMissing(), a, b)
We can extend that behaviour to isequal
. Especially PropagateMissing would be concise aswell.
Then we'd also add a trait istotalordering(x)
which contracts that isless
and isequal
are both defined if at least one of them is defined and that known mathematical requirements for a total ordering hold. Each appropriate comparator could define that to be true. HashComparator could do that for example.
The semantics of ordering objects mathematically and ordering them for reproducibility are distinct. Trying to use isless
for both purposes would be confusing and burdensome.
I agree that it's sometimes useful to impose a total order on a collection of (or all) objects. For example, an order on dictionary keys, or the order in which terms are printed in a symbolic algebra program. I have to totally order terms in expressions printed (and stored) by Symata. In this case, trying to extend <=
is ambiguous and doesn't give you the lexicographical ordering you want anyway. I used a distinct comparison function for lexicographic ordering. I suspect that even defining a separate lexicographic ordering in Julia would be difficult, because different orderings would be better for different use cases.
Most helpful comment
Not a fan. There are times when absence of an operation helps you think clearly: for example if your calculation results in the answer
3s + 4m
(s = seconds, m = meters) then you know you've made a mistake. If I'm working in a domain where I expect rotational invariance, then the only case in which a collection of points can be placed in a partial order is when they fall on a single line through the origin. If you suddenly give me an arbitrary way to order them anyway, you've made it harder for me to think clearly.