Julia: `1 < missing < 3` fails

Created on 25 Feb 2019  Â·  20Comments  Â·  Source: JuliaLang/julia

julia> 2 < missing < 3
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] top-level scope at none:0

Should be missing, no?

missing data

Most helpful comment

For the sake of concreteness: IntervalSets.jl didn't support missing, and I found out that their methods:

in(v, I::TypedEndpointsInterval{:closed,:closed}) = leftendpoint(I) ≤ v ≤ rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:open}) = leftendpoint(I) < v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:closed,:open}) = leftendpoint(I) ≤ v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:closed}) = leftendpoint(I) < v ≤ rightendpoint(I)

would have worked fine already if this issue was fixed, but I had to add other methods to make it work:

in(::Missing, I::TypedEndpointsInterval{:closed,:closed}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:closed,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:closed}) = !isempty(I) && missing

It's not the end of the world of course. It does raise the issue of whether 3 < missing < 1 should be missing, or false.

All 20 comments

The syntax has short-circuiting semantics [1] (i.e. the upper bound doesn't get evaluated if the first condition fails. Of course there could be some special exception for missing, but it would certainly make the lowering significantly more complicated.

[1] i.e. a < b < c is equivalent to a < b && b < c.

It explains this:

julia> missing < 1 < 3
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] top-level scope at none:0

julia> 3 > 1 > missing
missing

I agree it would be nice if this worked. For example a typical pattern would be ~filter(x -> 30 < x < 60, age)~ (or something similar on data frame rows) EDIT: this fails in the presence of missing values, it only makes sense with a special function which would skip missing values as in SQL or R.

But we probably don't want to special-case missing, so we would have to use & instead of && in all cases, which isn't possible before 2.0.

FWIW this already works:

julia> 2 .< missing .< 3
missing

Triage discussed this for quite a while, and none of the options for implementing this are that appealing. We did not come to a conclusion in this discussion. Some of the options discussed were:

  • Special-case missing in the front-end (not really a good option)
  • Use & (integers have this return an integer, which is a problem)
  • Do nothing

Speaking on behalf of myself (not of triage), I think I'm beginning to favor doing nothing here. This may be one of the cases where users should be wary of missing values and use explicit ismissing checks as appropriate. It's not an ideal solution, but it's explicit.

I had proposed the following lowering of a < b < c, which some people liked, but didn't get much love otherwise.

#1 = a
#2 = b
r = #1 < #2
if r === true
     r = #2 < c
else
     r = &(r, true)
end
return r

I didn't quite catch it on the call; why is the last r = &(r, true) needed? Isn't the fact that < should really only return Bool (or missing-like singletons) enough?

It at least requires the thing you return to support &, so it's similar to switching the lowering to use & except it's still lazy.

I get that, but it feels strange to me since we don't call & in the other branch. If we're going to do anything at all here I'd favor the simpler:

#1 = a
#2 = b
r = #1 < #2
if r === true
     r = #2 < c
end
return r

That said, I'm not wholly convinced we actually need to do something here. @nalimilan's filtering example would be compelling, except that filter wouldn't even work if its function returns missing (not even DataFrames' specialization). Typically if I do this as a scalar operation, I'll be branching on the result in short order. If I just want to store the result, I think it'd be _much_ more common to just broadcast — which lowers to (a .< b) .& (b .< c).

If I just want to store the result, I think it'd be _much_ more common to just broadcast

Not sure what you mean, could you elaborate?

I get that, but it feels strange to me since we don't call & in the other branch

Why? We force the LHS to return a boolean now, but not the RHS

If I just want to store the result, I think it'd be much more common to just broadcast

Not sure what you mean, could you elaborate?

There are two ways of using the result from a < b < c.

  • You can use it to make a decision. If we propagated a missing out of it, it would throw an error as soon as you tried to branch on if missing. Implementing this would just move the error one step down the road in an if a < b < c expression.
  • You can store the result or do some more computations on it later down the road (and _don't_ branch on it). In my mind, this is much more common to be done on many values at once — and thus you'd typically just use broadcast.

Why? We force the LHS to return a boolean now, but not the RHS

It just gets back to my "what's the point" question. If it's to emulate & semantics, then it seems like we should actually call & on both branches. If it's to ensure we get a logicky value back from <, then that seems like the kind of thing we tend to say "don't do that" instead of adding complications like this — particularly complications in lowering.

For the sake of concreteness: IntervalSets.jl didn't support missing, and I found out that their methods:

in(v, I::TypedEndpointsInterval{:closed,:closed}) = leftendpoint(I) ≤ v ≤ rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:open}) = leftendpoint(I) < v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:closed,:open}) = leftendpoint(I) ≤ v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:closed}) = leftendpoint(I) < v ≤ rightendpoint(I)

would have worked fine already if this issue was fixed, but I had to add other methods to make it work:

in(::Missing, I::TypedEndpointsInterval{:closed,:closed}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:closed,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:closed}) = !isempty(I) && missing

It's not the end of the world of course. It does raise the issue of whether 3 < missing < 1 should be missing, or false.

Sorry, my example with filter was indeed incorrect. I had a SQL- or R-like behavior in mind, where missing values are skipped, and which we should probably support for DataFrames (e.g. via an argument -- I can't find the place where we discussed this before). @cstjean's example is much better since it's not hypothetical.

It does raise the issue of whether 3 < missing < 1 should be missing, or false.

This is a really good point. If we're going to go out of our way to handle missing cleverly, should we try to handle this? Is that even a reasonable thing to try to do?

I can't imagine returning anything other than missing, as that's consistent with what we do everywhere else. Incidentally that seems to be what @cstjean does in his example.

Nope, he elected to return false in the case akin to 3 < missing < 1 (isempty(I)) — and I agree with that choice.

Ah, my bad, I hadn't realized the order of 3 and 1 was the question at hand here. Then I'd say either missing or false are reasonable. false would be clever, and consistent with the fact that we use all available information when computing the return value in three-valued logic (e.g. false & missing === false). But it would require a more complex lowering.

If we give up on the short-circuiting behaviour, couldn't a < b <= c lower to something like compare(a, <, b, <=, c), with compare being a generated function?

3 < missing < 1 evaluting to false is similar to the interesting cases from https://github.com/JuliaLang/julia/pull/31066#issuecomment-463576235

@test clamp(1, 52, missing) == 52
@test clamp(1000, missing, 101) == 101
Was this page helpful?
0 / 5 - 0 ratings