Julia: Can isless be defined for everything?

Created on 19 Feb 2020 · 9Comments · Source: JuliaLang/julia

I thought my comment https://github.com/JuliaLang/julia/pull/34744#issuecomment-588002666 may be distracting the original issue so I open it as a separate issue. My questions are:

Is it possible to define isless for everything?
Can it have practically fast enough implementation?
Is it reasonable to use an _arbitrary_ ordering for isless?

I think the answer to point 1 is yes, since I can do isless(a, b) = isless(objectid(a), objectid(b)). This is stupid, but I think it's OK as a proof of possibility. In particular, this means that, if we define equivalence by mapping to an integer, we get _a_ total ordering "for free."

For point 2, I think there are cases where optimization is trivial and covers a wide range of practical use. For example, we can lower isless(a::T, b::T) with struct type T to isless on the tuples of field values.

If the philosophy behind the compatibility between hasing and sorting is to support consistent behavior of hash-based and comparison-based containers, I think the answer to point 3 is yes too. But I don't think I have thought enough about it so maybe I'm missing something. In particular, it may be impossible to avoid ordering that changes julia process to process (like how hash behaves).

Source

tkf

Most helpful comment

Not a fan. There are times when absence of an operation helps you think clearly: for example if your calculation results in the answer 3s + 4m (s = seconds, m = meters) then you know you've made a mistake. If I'm working in a domain where I expect rotational invariance, then the only case in which a collection of points can be placed in a partial order is when they fall on a single line through the origin. If you suddenly give me an arbitrary way to order them anyway, you've made it harder for me to think clearly.

timholy on 19 Feb 2020

👍8

All 9 comments

Why is it desirable?

StefanKarpinski on 19 Feb 2020

I suppose this is required for storing arbitrary key/element in a comparison-based dictionary/set? This is required for sort-based group-by, too.

(Though I guess I can live without this. I may be getting too greedy here.)

I guess a practical "baby-step" request here is to define isless(::T, ::T) when T is a struct:

isless(a::T, b::T) where {T} =
    if isstructtype(T)
        isless(_fieldvalues(a), _fieldvalues(b))
    else
        throw(MethodError(isless, (a, b)))
    end
_fieldvalues(x) = ntuple(i -> getfield(x, i), nfields(x))

Maybe even better definition is to be flexible for parametric types if isstructtype(typeof(a)) && isstructtype(typeof(b)) && typename(typeof(a)) === typename(typeof(b)) instead of if isstructtype(T).

Given that hash is automatically defined for everything, being able to compare with the same type seems to be a reasonable extension.

tkf on 19 Feb 2020

Maybe even better definition is to be flexible for parametric types if isstructtype(typeof(a)) && isstructtype(typeof(b)) && typename(typeof(a)) === typename(typeof(b)) instead of if isstructtype(T).

Oh, no, wait. This does not work when there are type parameters that are not determined by the field types.

tkf on 19 Feb 2020

timholy on 19 Feb 2020

👍8

I agree that a lack of definitions helps to find bugs. But I think that's in direct tension to using isless for containers where you just need _an_ ordering, not the one compatible with the mathematical objects you are modeling with the type system. I think this is more about < falling back to isless automatically. Though I guess it means any kind of automatic definition of isless requires a breaking change in <...

tkf on 19 Feb 2020

I guess the better approach would be to write a new dict type which accepts a type or function for ordering its elements. Well, I guess there's such thing already. Defining isless for all pairs explicitely is a bad idea IMO. If you are fine with an arbitrary ordering anyway just convert your objects to an arbitrary totally ordered space (e.g. hashing them, or taking objectid, though the latter changes between runs) and compare them there. That's the cleanest approach since it doesn't do weird default behaviour for incomparable objects.
In other words: Don't introduce arbitrary decisions into fallbacks. Keep the rule to only define sane defaults and error otherwise.

rapus95 on 19 Feb 2020

If you are fine with an arbitrary ordering anyway just convert your objects to an arbitrary totally ordered space (e.g. hashing them, or taking objectid, though the latter changes between runs) and compare them there.

I think I'm convinced this is the only reasonable approach at least within 1.x. Traits for querying the existence of isless and isequal are required for making this compatible for customized isless and isequal, though.

tkf on 19 Feb 2020

Regarding that I was recently (on slack) thinking about a more generic approach of isless which also tracks a ComparatorContext{T1, T2} with it. Say, we'd get a 3-arg isless. Then the following lines could be examples of what would be possible very concisely)

isless(::HashComparator, a, b) = isless(CanonicalComparator(UInt), hash(a), hash(b))
isless(::CanonicalComparator{UInt}, x::UInt, x::UInt) = #some native call
isless(a,b) = isless(CanonicalComparator(typeof(a), typeof(b)), a, b)
isless(::PropagateMissing, a::Missing, b) = missing
isless(::PropagateMissing, a, b::Missing) = missing
isless(x::PropagateMissing, a, b) = isless(CanonicalComparator(typeof(a), typeof(b)), a, b)
<(a,b) = isless(PropagateMissing(), a, b)

We can extend that behaviour to isequal. Especially PropagateMissing would be concise aswell.

Then we'd also add a trait istotalordering(x) which contracts that isless and isequal are both defined if at least one of them is defined and that known mathematical requirements for a total ordering hold. Each appropriate comparator could define that to be true. HashComparator could do that for example.

rapus95 on 19 Feb 2020

The semantics of ordering objects mathematically and ordering them for reproducibility are distinct. Trying to use isless for both purposes would be confusing and burdensome.

I agree that it's sometimes useful to impose a total order on a collection of (or all) objects. For example, an order on dictionary keys, or the order in which terms are printed in a symbolic algebra program. I have to totally order terms in expressions printed (and stored) by Symata. In this case, trying to extend <= is ambiguous and doesn't give you the lexicographical ordering you want anyway. I used a distinct comparison function for lexicographic ordering. I suspect that even defining a separate lexicographic ordering in Julia would be difficult, because different orderings would be better for different use cases.