Julia: should isequal(0.0, -0.0) == true?

Created on 13 Sep 2016  Â·  41Comments  Â·  Source: JuliaLang/julia

Came up in https://github.com/JuliaLang/julia/issues/9381 but deserves its own issue: it's been proposed that positive and negative zero should hash and compare as equal, but that has an unfortunate interaction with sorting and stability.

breaking decision

Most helpful comment

My current thinking is that we should _not_ make this change. Here's a run-down:

  • 0.0 and -0.0 give different results for some well-defined, perfectly valid operations such as 1/x and many complex operations involving branch cuts. So they're meaningfully different.
  • If a certain application wants to equate 0.0 and -0.0 for use as dict keys, it's easy to use x==0 ? abs(x) : x as the key. But if isequal equates the values and you want to distinguish them, that's harder to get.
  • If we're going to keep both == and isequal, we might as well make them consistent with IEEE equals and totalOrder, respectively. Keeping isequal just for NaNs is not a huge win. If the plan involved getting rid of isequal completely, it might be worth a bit of ugliness.

All 41 comments

Does it help or hinder to consider that in fact isequal might be two different ideas? Consider:

my_is_equivalent(2+2,4) is true (post evaluation)
my_is_the_same_as(2+2,4) is false (prior to evaluation)

It is perhaps unfortunate that multiplication propagates the negative for reals: 0.0 * -0.0 evaluates to -0.0; it is ok for integers.

With respect to 0.0 and -0.0 it is a similar problem to that of "is noon AM or PM?" In fact it is neither, however we often label noon pm because 11:59AM,12:00AM,12:01PM makes less sense than 12:00PM in that context.

I like this question.

# v0.5

nan1 = Inf/Inf;  nan1 = -0.0*Inf;  pz = 0.0; nz= -0.0;

nan1==nan2, nan1===nan2, isequal(nan1,nan2)
# (false, true, true)
pz==nz, pz===nz, isequal(pz,nz)
# (true, false, false)

# for a.b =Nan, NaN and for a,b = 0.0, -0.0
# (==)(a,b) = !(===)(a,b)  so  a == b  iff a and b differ

with (0.0 == -0.0) === true,
only NaNs have the property that equalsness is given with nonidentity
and inequalsness is given with identity

That is reason-about-able for entities that are NotNumber (excluded from numerosity).
And well avoided for entities that of numerousness or of absenct numerousness.

I favor that Julia specify isequal( 0.0, -0.0 ) is true.

@JeffreySarnoff Do you then agree that sorting of numbers doesn't distinguish between positive and negative zero any more, and that hashing returns the same value, and that dictionaries can hold only one key for zero, etc.?

@eschnett no.

branch-cut sanity carries utility and pervades reliability more than does zeroing the zeros

pz = 0.0; nz = -pz;

showcompact( (atan2(nz,nz), atan2(nz, pz), atan2(pz, pz), atan2(pz, nz)) )
# ( -3.14159,-0.0, 0.0, 3.14159 )

object_id( pz ) != object_id( nz ) # now, and my forward preference
with this, object_id dicts hold distinct keys for 0.0 and -0.0

@eschnett One cannot sort handedness nor parity, one may group like handednesses and same parity.

This is not a question:

taitjai

Which color precedes?

White.

The lower White or the left White?

That's a tricky one. Will have to come back to you on that.

@JeffreySarnoff Did you answer my question?

Let me rephrase things:

  • What should issorted([0.0, -0.0]) return?
  • Should hash(0.0) == hash(-0.0)?
  • Can Dict dictionaries hold different values for keys 0.0 and -0.0?

@eschnett alright

Ought isequal( 0.0, -0.0 ) be true? Yes.

Must object_id( 0.0 ) differ from object_id( -0.0 )? Yes.

Must some Dict dictionaries hold _different_ values for keys 0.0 and -0.0? Yes.
May other Dict dictionaries hold _the same_ values for keys 0.0 and -0.0? Yes.
Should we allow Dict dicts to hold _the same_ values for keys 0.0 and -0.0? Yes.

isequal(x,y) must imply that hash(x) == hash(y). (the docs)

Should hash(0.0) == hash(-0.0)? Yes.

Must some Dict dictionaries hold different values for keys 0.0 and -0.0? Yes.

That's fine for ObjectIdDict, but otherwise contradicts isequal(0.0,-0.0) being true. Unless we add a fourth standard equality predicate.

@JeffBezanson you intuited my intent. As long as ObjectIdDict is available, the _Must_ some _Dict dictionaries_ statement is satisfied, and happiness suffuses logic.

Dict is a type in Julia; if you use Dict to create a dictionary, you will not get an ObjectIdDict. Thus my expression "Dict dictionary", as I did not want to discuss ObjectIdDict.

Here's a crazier but more general idea: might it be possible to rename the current Dict type

immutable HashDict{K,V,Hft,Eft} <: Associative{K,V}
    hash::Hft
    isequal::Eft
    ...  # as present
end

then define

typealias Dict{K,V} HashDict{K,V,typeof(hash),typeof(isequal)}

and thus allow users to supply their own hash and equality functions, kind of like how C++ does it?

Yes, it's possible and works like you surmised. Here's a complete demo implementation I had toyed with a bit a couple months ago: https://github.com/JuliaLang/julia/commit/fce28b4a9e3e59dc3c9ab566b77d7bf63b6d7d91

I don't really want to do that – I think it's a case of serious over-parameterization of the type. If you want to use custom hashing for your dict, you should probably call a transformation function on your keys before inserting them; it's much simpler and easier to reason about. It also doesn't address this problem since we still need to provide an isequal function that works well.

One parameter could be removed by keeping only the hash function, and computing the isequal function from it. This would also prevent using a hash function with the incorrect isequal companion function.

Ok, but this is totally tangential since however Dict works, we still need to design the way the isequal function works in the first place. Saying "you can use whatever equality predicate you want" isn't solving anything – it's just sweeping the problem under the proverbial rug.

@StefanKarpinski Point understood. The Dict issue is orthogonal to what the best behaviour for isequal should be. I lean towards making 0.0 and -0.0 equal and hash equal — this seems far less surprising mathematically. Does sorting really require isless and isequal to be consistent? I believe many other languages make do with only a single comparison predicate.

Also, if things as different as 0 and 0.0 are going to be considered equal, then 0 and -0.0 ought to be also. Especially since -0.0 is a better "exact zero" than 0.0 is in IEEE floating point arithmetic.

julia> isequal(0, 0.0)
true

julia> isequal(0, -0.0)
false

Especially since -0.0 is a better "exact zero" than 0.0 is in IEEE floating point arithmetic

How so?

-0.0 is a better neutral element for summations. We have -0.0 + 0.0 = 0.0.

Perhaps it's an exaggeration to say that -0.0 is a better exact zero, since -1.0 + 1.0 = 0.0. But -0.0 is just as zero as 0.0 is, and shouldn't be !isequal to integer zero.

Consider that (-0.0)+(-0.0)-(-0.0)==0.0 To me this indicates that -0.0 really should equal 0.0, as otherwise addition that really should be commutative isn't. I would similarly suggest that hash(-0.0)==hash(0.0) and that dicts should treat them the same way.

My current thinking is that we should _not_ make this change. Here's a run-down:

  • 0.0 and -0.0 give different results for some well-defined, perfectly valid operations such as 1/x and many complex operations involving branch cuts. So they're meaningfully different.
  • If a certain application wants to equate 0.0 and -0.0 for use as dict keys, it's easy to use x==0 ? abs(x) : x as the key. But if isequal equates the values and you want to distinguish them, that's harder to get.
  • If we're going to keep both == and isequal, we might as well make them consistent with IEEE equals and totalOrder, respectively. Keeping isequal just for NaNs is not a huge win. If the plan involved getting rid of isequal completely, it might be worth a bit of ugliness.

this indicates that -0.0 really should equal 0.0, as otherwise addition that really should be commutative isn't

That is arguably handled by ==.

My point is just that given how many seeming tautologies applied to -0.0 result in 0.0 could cause some really confusing results (although I guess that mainly is because using floats as dict keys is an absolutely horrible idea)

Nevertheless, it is still confusing that isequal(0, 0.0) but not isequal(0, -0.0). If equality must be transitive, then perhaps !isequal(0, 0.0) is the better option.

it is still confusing that isequal(0, 0.0) but not isequal(0, -0.0)

Yes, I can kind of see that, but once you accept that isequal distinguishes negative zero it makes sense. 0 certainly isn't a "negative zero", but in fact could be with a signed-magnitude integer type, so this doesn't bother me so much.

but consider that isequal(-0, -0.0) is false.

But -0 === 0; the expression used to compute a value is immaterial.

I like this question.

  • 0.0 and -0.0 give different results for some well-defined, perfectly valid operations such as 1/x and many complex operations involving branch cuts. So they're meaningfully different.

Yes, and that is a [meaningful operational] difference without a [measure theoretic] distinction. That they are different hither and the same yon is meaningfuller.

   -  they differ as ObjectIDentifiable  symbols
    - additively, they are the same
    - multiplicatively, they differ
    - they are the same under linear ordering, 
           (without nonzero measure, parity  is as orientation -- intrinsically unorderable)

If a certain application wants to equate 0.0 and -0.0 for use as dict keys, it's easy to use x==0 ? abs(x) : x as the key. But if isequal equates the values and you want to distinguish them, that's harder to get.

Reading that standing on my head,: the branch cuts and the multiplictive resolution of odd signedness are handled for almost all applications that are not writing a floating point math library. The likelihood is members of the Julia community and a user of a Julia products for their own purposes consider them an incomplete sameness rather than as not completely different. Relying on the coherently consistent branch cuts that are availble with 0.0, -0.0 requires onlty that !(0.0 === -0.0), irrespective of the truth valueisequal( 0.0 , -0.0) yeilds. As one is of object non-identitty and iesequal(0.0, -0.) is is of the inststinguishability of lattice-free measures zero, they need [qua ought] not agree..

If we're going to keep both == and isequal, we might as well make them consistent with IEEE equals and totalOrder, respectively. ...

Doing so assigns truth to each of these expressions

  (0.0 == -0.0),  !( 0.0 === -0.0 ),  isequal(0.0, -0.0)

We are in agreement.

@TotalVerb: Does sorting really require isless and isequal to be consistent?

Yes, since otherwise if you change from a hash-based dictionary data structure to a tree-based one, you'll get different behavior, which is awful. Not having isless and isequal be consistent would be pretty bad and whether you have another name for it or not, it implicitly introduces yet another ordering.

I believe many other languages make do with only a single comparison predicate.

Lots of languages only have a few names for their orderings. But they usually have a lot of implicit orderings created by the rampant inconsistencies between the different orderings that they actually exhibit. There's a reason that equality is the first place to look for "wat" behavior in languages.

Given @JeffBezanson's arguments and the fact that no clever solutions to the sorting issues raised by isequal(-0.0, 0.0) have been forthcoming, I'm inclined to agree that we shouldn't do this.

That argument makes sense to me.

Actually, the sorting issue makes some sense to me as well.

import Base: unbox, slt_int, eq_float, isless, isequal


for (T,I) in [(:Float64, :Int64), 
              (:Float32, :Int32), 
              (:Float16, :Int16)]
    @eval begin
        isequal(a::$T, b::$T) = eq_float(unbox($T,a), unbox($T,b))
        function isless(a::$T, b::$T)
            ia = reinterpret($I, a)
            ib = reinterpret($I, b)
            ifelse( signbit(ia) & signbit(ib), 
                    slt_int(ib,ia), slt_int(ia,ib) )
        end
    end
end


floatvec = [-Inf,-100.0,-1.0,-0.5,-0.0,0.0,0.5,1.0,100.0,Inf];
floatvec == sort(floatvec)  # true
sort([NaN, Inf, -Inf])'     # [-Inf, Inf, NaN] (as now)

issorted([ -0.0, 0.0 ]), issorted([ 0.0, -0.0 ]) # (true, false)

-0.0 === 0.0, 0.0 === -0.0              # (false, false)

-0.0 == 0.0, 0.0 == -0.0                # (true, true)
-0.0 < 0.0,  0.0 < -0.0                 # (false, false)
-0.0 <= 0.0,  0.0 <= -0.0               # (true, true)

isequal(-0.0, 0.0), isequal(0.0, -0.0)  # (true, true)
isless(-0.0, 0.0), isless(0.0, -0.0)    # (true, false)

that's all -- thanks for the active discussion

Seems like this is resolved in favor of preserving the current behavior isequal(+0.0, -0.0) == false, in which case this issue can be closed.

This seems resolved, and (for once) conveniently in a way that requires no action.

ok. for my own edification: what was the drawback to the implementation I
posted responding to consistent sorting of negatives before nonnegatives?

On Thu, Oct 27, 2016 at 1:28 PM, Stefan Karpinski [email protected]
wrote:

This seems resolved, and (for once) conveniently in a way that requires no
action.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/julia/issues/18485#issuecomment-256714087,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABmqxtvyRIfXyXnkfqa_BQeWj5ljuvsQks5q4N9MgaJpZM4J8CCv
.

That's a redefinition of what sorting means, not an implementation of stable sorting.

For newcomers who may be curious about this issue and finding this discussion, would it be accurate to say that, after applying type promotion rules:

  • isequal(x, y) assesses bit equality
  • x == y assesses arithmetic equality

?

Roughly, except that isequal considers all NaNs to be equal even though they can have many different bit patterns.

Was this page helpful?
0 / 5 - 0 ratings