@vtjnash recommended I write a bug report after encountering this behavior and looking into workarounds.
At present:
true == 1
# true
This seems fairly jarring given that the Bool
type seems much more restrictive than in languages like e.g. Python where implicit promotion/deference to parent methods for comparison are typical. For example, this:
if 1
println("Ok!")
end
# ERROR: TypeError: non-boolean (Int64) used in boolean context
Would also imply to me (not necessarily in any formal sense, though) that the prior equality test should also result in a TypeError
.
In this case I want to be able to put both 1
and true
in a Set
of type Any
:
Set([1, true, "true"])
# Set(Any["true",true])
But due to hash/equality semantics, only one will make it in. I could return an alternative implementation of Set
backed by an ObjectIdDict
for my use case, which is a serialization library that will convey and receive sets from other languages (whose Boolean equality/hash semantics are heterogenous), but wanted to check in to see if this is the desired behavior, or falling out of some implementation detail.
It comes from the fact that Bool
is essentially just UInt1
鈥撀燼 one-bit unsigned integer. The true == 1
part doesn't really bother me. The principle is that it's ok to use a boolean where a number is expected, but it's not ok to use a non-boolean where a boolean is expected. This is often handy, e.g. when counting the number of values that are true using sum
. The Set
business is more concerning. We could make Bool
a non-numeric type, but that would have other consequences, e.g. it would no longer be possible to have im = Complex(false,true)
, although we've discussed recently whether that's a good idea.
In a set you'll also run into problems if you want to add 1
and 1.0
at the same time. You should be able to use isequal
(i.e. ===
) instead of ==
to compare values in a set.
How does Set
handle NaN
? Technically, NaN != NaN
, and it should thus keep all of them?
@eschnett Just to be clear, since the Set
impl is a HashSet - backed by a Dict
's keys with all values as Void
, the important detail is how each value hashes:
julia> Set([NaN, NaN])
Set([NaN])
julia> hash(NaN)
0x15d7d083d04ecb90
julia> hash(1)
0x02011ce34bce797f
julia> hash(true)
0x02011ce34bce797f
The Set
is the concerning part for me as well @StefanKarpinski (impact is it will require a workaround that impacts usability when edge case is encountered). Maybe I should revise issue title to reflect Set as focus? The behavior of equality came out of the discussion on irc.
I was confused; of course, isequal(NaN, NaN)
is true
.
note that Char
has a similar issue, and also fails for transitive equality with non-integer types 32.0 != ' ' == 32
The true == 1 part doesn't really bother me. The principle is that it's ok to use a boolean where a number is expected, but it's not ok to use a non-boolean where a boolean is expected.
in comparing bool == foo
, I think it's arguably expected that foo::Bool
Yeah, I think it's even more surprising with Char
in the Set
context:
> Set([32, ' '])
# Set(Any[' '])
Probably more plausibly encountered in a scenario like:
> x = " foo"[1]
# ' '
> Set([32, x])
# Set(Any[' '])
Since Char
isn't a subtype of Number (unlike Bool
) I agree ' ' == 32
doesn't seem right.
The char vs number thing is definitely a bug left over from when Char was a kind of integer. I'm working on a fix.
The Char vs. Int comparison thing is surprisingly annoying to fix 鈥撀爓e assume that you can compare integers and chars all over the place. I've almost got it done, but it makes me wonder about the change.
I think it's still the right way to go. Unlike e.g. for real and integer numbers, there is not commonly used abstraction that makes characters a subset of integers. ASCII and UTF-8 are common, but they specify an encoding, not an equivalence. It's easy enough to convert between characters and integers, and if a certain piece of code requires this too often, then I wonder whether there's an abstraction missing.
Most other languages -- Python, Fortran, Mathematica -- don't allow such comparisons either. C and its descendents seem to be in the exception.
"there is not commonly used abstraction that makes characters a subset of integers" .... Unicode?
In any case, they don't need to be isequal
, which would fix the Set issue.
Indeed Unicode offers a standard mapping from integer to char, but comparison between these types is still quite confusing. +1 for deprecating it, and requiring people to write Char(32) == ' '
when that's really what they want.
I don't know... what languages have true character types that don't allow such comparisons? Python doesn't have a character type per se, it only has length-1 strings, and Fortran is similar if I understand it correctly. I don't know about Mathematica, but Mathematica is not particularly well-known for its strength in string processing.
In any case, I feel like discussion of Char == Integer
should be in a separate issue.
" it would no longer be possible to have im = Complex(false,true), although we've discussed recently whether that's a good idea."
Assuming changing that is a good idea, imo the change should still allow
[U]Int +,-,* Bool, Bool +,-,* [U]Int with true working as 1 and false working as 0
by overloading those signatures (as they are so handy, at times).
This is decided and consistent in 1.0: yes, true == 1
.
Most helpful comment
It comes from the fact that
Bool
is essentially justUInt1
鈥撀燼 one-bit unsigned integer. Thetrue == 1
part doesn't really bother me. The principle is that it's ok to use a boolean where a number is expected, but it's not ok to use a non-boolean where a boolean is expected. This is often handy, e.g. when counting the number of values that are true usingsum
. TheSet
business is more concerning. We could makeBool
a non-numeric type, but that would have other consequences, e.g. it would no longer be possible to haveim = Complex(false,true)
, although we've discussed recently whether that's a good idea.