Julia: `1 < missing < 3` fails

Created on 25 Feb 2019 · 20Comments · Source: JuliaLang/julia

julia> 2 < missing < 3
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] top-level scope at none:0

Should be missing, no?

missing data

Source

cstjean

Most helpful comment

For the sake of concreteness: IntervalSets.jl didn't support missing, and I found out that their methods:

in(v, I::TypedEndpointsInterval{:closed,:closed}) = leftendpoint(I) ≤ v ≤ rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:open}) = leftendpoint(I) < v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:closed,:open}) = leftendpoint(I) ≤ v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:closed}) = leftendpoint(I) < v ≤ rightendpoint(I)

would have worked fine already if this issue was fixed, but I had to add other methods to make it work:

in(::Missing, I::TypedEndpointsInterval{:closed,:closed}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:closed,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:closed}) = !isempty(I) && missing

It's not the end of the world of course. It does raise the issue of whether 3 < missing < 1 should be missing, or false.

cstjean on 8 Mar 2019

👍3 ❤2

All 20 comments

The syntax has short-circuiting semantics [1] (i.e. the upper bound doesn't get evaluated if the first condition fails. Of course there could be some special exception for missing, but it would certainly make the lowering significantly more complicated.

[1] i.e. a < b < c is equivalent to a < b && b < c.

Keno on 25 Feb 2019

It explains this:

julia> missing < 1 < 3
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
 [1] top-level scope at none:0

julia> 3 > 1 > missing
missing

cstjean on 25 Feb 2019

I agree it would be nice if this worked. For example a typical pattern would be ~filter(x -> 30 < x < 60, age)~ (or something similar on data frame rows) EDIT: this fails in the presence of missing values, it only makes sense with a special function which would skip missing values as in SQL or R.

But we probably don't want to special-case missing, so we would have to use & instead of && in all cases, which isn't possible before 2.0.

nalimilan on 27 Feb 2019

😕1 👍1

FWIW this already works:

julia> 2 .< missing .< 3
missing

cstjean on 28 Feb 2019

Triage discussed this for quite a while, and none of the options for implementing this are that appealing. We did not come to a conclusion in this discussion. Some of the options discussed were:

Special-case missing in the front-end (not really a good option)
Use & (integers have this return an integer, which is a problem)
Do nothing

Speaking on behalf of myself (not of triage), I think I'm beginning to favor doing nothing here. This may be one of the cases where users should be wary of missing values and use explicit ismissing checks as appropriate. It's not an ideal solution, but it's explicit.

ararslan on 7 Mar 2019

I had proposed the following lowering of a < b < c, which some people liked, but didn't get much love otherwise.

#1 = a
#2 = b
r = #1 < #2
if r === true
     r = #2 < c
else
     r = &(r, true)
end
return r

Keno on 7 Mar 2019

I didn't quite catch it on the call; why is the last r = &(r, true) needed? Isn't the fact that < should really only return Bool (or missing-like singletons) enough?

mbauman on 7 Mar 2019

It at least requires the thing you return to support &, so it's similar to switching the lowering to use & except it's still lazy.

JeffBezanson on 7 Mar 2019

I get that, but it feels strange to me since we don't call & in the other branch. If we're going to do anything at all here I'd favor the simpler:

#1 = a
#2 = b
r = #1 < #2
if r === true
     r = #2 < c
end
return r

That said, I'm not wholly convinced we actually need to do something here. @nalimilan's filtering example would be compelling, except that filter wouldn't even work if its function returns missing (not even DataFrames' specialization). Typically if I do this as a scalar operation, I'll be branching on the result in short order. If I just want to store the result, I think it'd be _much_ more common to just broadcast — which lowers to (a .< b) .& (b .< c).

mbauman on 7 Mar 2019

If I just want to store the result, I think it'd be _much_ more common to just broadcast

Not sure what you mean, could you elaborate?

ararslan on 7 Mar 2019

I get that, but it feels strange to me since we don't call & in the other branch

Why? We force the LHS to return a boolean now, but not the RHS

Keno on 7 Mar 2019

If I just want to store the result, I think it'd be much more common to just broadcast

Not sure what you mean, could you elaborate?

There are two ways of using the result from a < b < c.

You can use it to make a decision. If we propagated a missing out of it, it would throw an error as soon as you tried to branch on if missing. Implementing this would just move the error one step down the road in an if a < b < c expression.
You can store the result or do some more computations on it later down the road (and _don't_ branch on it). In my mind, this is much more common to be done on many values at once — and thus you'd typically just use broadcast.

Why? We force the LHS to return a boolean now, but not the RHS

It just gets back to my "what's the point" question. If it's to emulate & semantics, then it seems like we should actually call & on both branches. If it's to ensure we get a logicky value back from <, then that seems like the kind of thing we tend to say "don't do that" instead of adding complications like this — particularly complications in lowering.

mbauman on 8 Mar 2019

👍1

For the sake of concreteness: IntervalSets.jl didn't support missing, and I found out that their methods:

in(v, I::TypedEndpointsInterval{:closed,:closed}) = leftendpoint(I) ≤ v ≤ rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:open}) = leftendpoint(I) < v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:closed,:open}) = leftendpoint(I) ≤ v < rightendpoint(I)
in(v, I::TypedEndpointsInterval{:open,:closed}) = leftendpoint(I) < v ≤ rightendpoint(I)

would have worked fine already if this issue was fixed, but I had to add other methods to make it work:

in(::Missing, I::TypedEndpointsInterval{:closed,:closed}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:closed,:open}) = !isempty(I) && missing
in(::Missing, I::TypedEndpointsInterval{:open,:closed}) = !isempty(I) && missing

It's not the end of the world of course. It does raise the issue of whether 3 < missing < 1 should be missing, or false.

cstjean on 8 Mar 2019

👍3 ❤2

Sorry, my example with filter was indeed incorrect. I had a SQL- or R-like behavior in mind, where missing values are skipped, and which we should probably support for DataFrames (e.g. via an argument -- I can't find the place where we discussed this before). @cstjean's example is much better since it's not hypothetical.

nalimilan on 8 Mar 2019

It does raise the issue of whether 3 < missing < 1 should be missing, or false.

This is a really good point. If we're going to go out of our way to handle missing cleverly, should we try to handle this? Is that even a reasonable thing to try to do?

StefanKarpinski on 8 Mar 2019

👍1

I can't imagine returning anything other than missing, as that's consistent with what we do everywhere else. Incidentally that seems to be what @cstjean does in his example.

nalimilan on 11 Mar 2019

Nope, he elected to return false in the case akin to 3 < missing < 1 (isempty(I)) — and I agree with that choice.

mbauman on 11 Mar 2019

Ah, my bad, I hadn't realized the order of 3 and 1 was the question at hand here. Then I'd say either missing or false are reasonable. false would be clever, and consistent with the fact that we use all available information when computing the return value in three-valued logic (e.g. false & missing === false). But it would require a more complex lowering.

nalimilan on 11 Mar 2019

If we give up on the short-circuiting behaviour, couldn't a < b <= c lower to something like compare(a, <, b, <=, c), with compare being a generated function?

cstjean on 11 Mar 2019

3 < missing < 1 evaluting to false is similar to the interesting cases from https://github.com/JuliaLang/julia/pull/31066#issuecomment-463576235

@test clamp(1, 52, missing) == 52
@test clamp(1000, missing, 101) == 101

oxinabox on 11 Mar 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

add @callsuper macro

StefanKarpinski · 3Comments

Dates.format regression on master: Width of milliseconds field cannot be fixed anymore

helgee · 3Comments

+(x::T,y::T) where {T} not parsing correctly

musm · 3Comments

Getting Openssl error while building from source on ubuntu 14.04.3 cmake version 2.8.12.2

arshpreetsingh · 3Comments

Literal NaNs in ASTs

Keno · 3Comments