julia> 2 < missing < 3
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
[1] top-level scope at none:0
Should be missing
, no?
The syntax has short-circuiting semantics [1] (i.e. the upper bound doesn't get evaluated if the first condition fails. Of course there could be some special exception for missing
, but it would certainly make the lowering significantly more complicated.
[1] i.e. a < b < c
is equivalent to a < b && b < c
.
It explains this:
julia> missing < 1 < 3
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
[1] top-level scope at none:0
julia> 3 > 1 > missing
missing
I agree it would be nice if this worked. For example a typical pattern would be ~filter(x -> 30 < x < 60, age)
~ (or something similar on data frame rows) EDIT: this fails in the presence of missing values, it only makes sense with a special function which would skip missing values as in SQL or R.
But we probably don't want to special-case missing, so we would have to use &
instead of &&
in all cases, which isn't possible before 2.0.
FWIW this already works:
julia> 2 .< missing .< 3
missing
Triage discussed this for quite a while, and none of the options for implementing this are that appealing. We did not come to a conclusion in this discussion. Some of the options discussed were:
&
(integers have this return an integer, which is a problem)Speaking on behalf of myself (not of triage), I think I'm beginning to favor doing nothing here. This may be one of the cases where users should be wary of missing values and use explicit ismissing
checks as appropriate. It's not an ideal solution, but it's explicit.
I had proposed the following lowering of a < b < c
, which some people liked, but didn't get much love otherwise.
#1 = a
#2 = b
r = #1 < #2
if r === true
r = #2 < c
else
r = &(r, true)
end
return r
I didn't quite catch it on the call; why is the last r = &(r, true)
needed? Isn't the fact that <
should really only return Bool (or missing-like singletons) enough?
It at least requires the thing you return to support &
, so it's similar to switching the lowering to use &
except it's still lazy.
I get that, but it feels strange to me since we don't call &
in the other branch. If we're going to do anything at all here I'd favor the simpler:
#1 = a
#2 = b
r = #1 < #2
if r === true
r = #2 < c
end
return r
That said, I'm not wholly convinced we actually need to do something here. @nalimilan's filtering example would be compelling, except that filter
wouldn't even work if its function returns missing
(not even DataFrames' specialization). Typically if I do this as a scalar operation, I'll be branching on the result in short order. If I just want to store the result, I think it'd be _much_ more common to just broadcast — which lowers to (a .< b) .& (b .< c)
.
If I just want to store the result, I think it'd be _much_ more common to just broadcast
Not sure what you mean, could you elaborate?
I get that, but it feels strange to me since we don't call & in the other branch
Why? We force the LHS to return a boolean now, but not the RHS
If I just want to store the result, I think it'd be much more common to just broadcast
Not sure what you mean, could you elaborate?
There are two ways of using the result from a < b < c
.
missing
out of it, it would throw an error as soon as you tried to branch on if missing
. Implementing this would just move the error one step down the road in an if a < b < c
expression.Why? We force the LHS to return a boolean now, but not the RHS
It just gets back to my "what's the point" question. If it's to emulate &
semantics, then it seems like we should actually call &
on both branches. If it's to ensure we get a logicky value back from <
, then that seems like the kind of thing we tend to say "don't do that" instead of adding complications like this — particularly complications in lowering.
For the sake of concreteness: IntervalSets.jl didn't support missing
, and I found out that their methods:
in(v, I::TypedEndpointsInterval{:closed,:closed}) = leftendpoint(I) ≤ v ≤ rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:open}) = leftendpoint(I) < v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:closed,:open}) = leftendpoint(I) ≤ v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:closed}) = leftendpoint(I) < v ≤ rightendpoint(I)
would have worked fine already if this issue was fixed, but I had to add other methods to make it work:
in(::Missing, I::TypedEndpointsInterval{:closed,:closed}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:closed,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:closed}) = !isempty(I) && missing
It's not the end of the world of course. It does raise the issue of whether 3 < missing < 1
should be missing
, or false
.
Sorry, my example with filter
was indeed incorrect. I had a SQL- or R-like behavior in mind, where missing values are skipped, and which we should probably support for DataFrames (e.g. via an argument -- I can't find the place where we discussed this before). @cstjean's example is much better since it's not hypothetical.
It does raise the issue of whether
3 < missing < 1
should bemissing
, orfalse
.
This is a really good point. If we're going to go out of our way to handle missing
cleverly, should we try to handle this? Is that even a reasonable thing to try to do?
I can't imagine returning anything other than missing
, as that's consistent with what we do everywhere else. Incidentally that seems to be what @cstjean does in his example.
Nope, he elected to return false
in the case akin to 3 < missing < 1
(isempty(I)
) — and I agree with that choice.
Ah, my bad, I hadn't realized the order of 3
and 1
was the question at hand here. Then I'd say either missing
or false
are reasonable. false
would be clever, and consistent with the fact that we use all available information when computing the return value in three-valued logic (e.g. false & missing === false
). But it would require a more complex lowering.
If we give up on the short-circuiting behaviour, couldn't a < b <= c
lower to something like compare(a, <, b, <=, c)
, with compare
being a generated function?
3 < missing < 1
evaluting to false is similar to the interesting cases from https://github.com/JuliaLang/julia/pull/31066#issuecomment-463576235
@test clamp(1, 52, missing) == 52
@test clamp(1000, missing, 101) == 101
Most helpful comment
For the sake of concreteness: IntervalSets.jl didn't support
missing
, and I found out that their methods:would have worked fine already if this issue was fixed, but I had to add other methods to make it work:
It's not the end of the world of course. It does raise the issue of whether
3 < missing < 1
should bemissing
, orfalse
.