Julia: Interpolation of `nothing` should be an error

Created on 31 May 2018  ·  20Comments  ·  Source: JuliaLang/julia

Now that we're firming up our notion of what nothing and missing actually mean, we should probably semantically enforce them a little bit more. Example:

Sys.which(cmd) will return nothing if the given cmd cannot be found on the PATH. This makes it easy to define a hierarchy of commands: coalesce(Sys.which("curl"), Sys.which("wget")), etc... However, if the user then interpolates the result into a Cmd, it makes sense that we should error out.

Jeff suggested disallowing all kinds of print() or string() conversion of nothing, leaving only repr(). I tend to agree, despite that this has the potential to be a quite breaking change.

I would also like to suggest that missing get a similar treatment, as it is a subtly different but largely identical concept.

breaking missing data

Most helpful comment

I think it makes sense for missing to in general be a non-fatal absence with regards to construction of larger objects, whereas nothing should be the fatal absence. In that sense, I think the following should be true:

@test_throws ArgumentError "$(nothing)"
@test "$(missing)" == ""
@test join([1,2,missing,4], "") == "1,2,,4"

All 20 comments

I see the idea, but I can't help feeling that's not the right place to handle this. Forcing users to handle explicitly the possibility that a value is nothing is one of the two effects/goals of Some (the other being to distinguish nothing from Some(nothing)). Yet we have generally taken the stance that we don't wrap return values in Some, and instead expect users to check manually for nothing when appropriate. So I'm not sure why we should throw an error for some functions and not for others.

I guess the motivation in the present case is that Sys.which used to throw an error, so we are not used to it returning nothing, and we want to protect users from that. But isn't that just due to history? Also note that even if we don't throw an error on interpolation, the command will most likely fail trying to run nothing anyway, so it doesn't make a big difference.

What is the actual utility of allowing nothing to interpolate as a string into things? Put another way, where is this functionality being used that is not currently a bug?

I agree interpolation doesn't sound very useful nor legitimate. But string is defined as "Create a string from any values using the print function", so wouldn't it be weird that nothing would be the only type not to support it? That's something I've wondered for the case of missing, where we could also imagine returning missing rather than "missing".

Argument for the specialness of nothing: there are a lot of situations now where you can do some operation and expect some kind of value like an integer or a string and the _only_ other value you may get is nothing. Therefore we should be extra careful with nothing as a return value since it's special in the sense of potentially appearing when you may not have been expecting it.

Another argument is that printing of nothing is likely to occur upon writing to a CSV:

julia> b = IOBuffer(); writedlm(b, [1 2 nothing 4], ',')
       print(String(take!(b)))
1,2,nothing,4

Now perhaps this could be argued that it's the responsibility of the CSV writer to handle this case, but it is somewhat compelling to me. It's also interesting that the REPL doesn't even attempt to print anything when the result is nothing.

This is slightly off-topic, but writedlm also writes missing as "missing", which I found somewhat surprising earlier today when writing a CSV. Not sure what I expected, but the worrisome case is when the string "missing" is written into a String column that may also contain "missing" as a valid string value.

Maybe CSV writers should print missing as an empty field instead?

I think it makes sense for missing to in general be a non-fatal absence with regards to construction of larger objects, whereas nothing should be the fatal absence. In that sense, I think the following should be true:

@test_throws ArgumentError "$(nothing)"
@test "$(missing)" == ""
@test join([1,2,missing,4], "") == "1,2,,4"

@test "$(missing)" == ""
@test join([1,2,missing,4], "") == "1,2,,4"

This would imply that print(missing) wouldn't print anything, as opposed to show, which needs to print missing. That would mean that the empty string is the "canonical (un-decorated) text representation" of missing, while "missing" is its "informative text representation". Would that be OK? I'm never sure where print and show are used down the line.

Yes, along with repr, which should also print “missing”.

Seems reasonable to me and worth trying. I like the "fatal absence" versus "non-fatal absence" distinction.

Consider a new user trying out something like

julia> f(x) = x>0?nothing:-x
f (generic function with 1 method)

julia> a = f(1);

julia> print("I ran f and got $a")
I ran f and got nothing

You really want them to get an ArgumentError here? I imagine that would be very frustrating.

That case can have a very helpful, specific error message, which makes it not so frustrating since it immediately tells you how to fix the problem: do "I ran f and got $(repr(a))" instead, which is good advice in general. The case of getting an unexpected nothing and splicing it into a command or whatever and not being able to figure out what went wrong, on the other hand, seems like it would be truly frustrating and not just for novices.

Quite right --- getting an error (hopefully with a useful message) while trying things out is not a problem at all compared to silently getting nothing in your output stream, having it cause a problem much later, and then having to track down where it came from.

I guess with a good error message it wouldn't be so bad.

See https://github.com/JuliaLang/julia/issues/27729 which proposes pwd() == nothing when in a deleted directory but as an alternative returning some kind of NotFound object which could also potentially be returned by Sys.which to give a better error message upon usage. However that object would not be amenable to the coalesce / something code which is annoying.

FWIW, I'm not opposed to print(nothing) throwing an error and print(missing) printing nothing. The latter definitely makes sense for CSV, even if it's generally more complex than that (e.g. whitespace-delimited files are usually parsed by skipping contiguous spaces, so that a special text value like NA is needed to represent a missing value). Anyway I'm not completely clear on where print is/should be used rather than show.

Triage is in favor, noting that if we change our minds, we can but if we don't do this now, we can't change it in the 1.x timeline.

Any resolution wrt. #27650? Can logging of nothing still result in no message, or should it error as well? Or could we use missing for that, as @nalimilan proposed above?

I don't think logging should use missing for that. Having a special case for nothing there would probably be fine.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

iamed2 picture iamed2  ·  3Comments

Keno picture Keno  ·  3Comments

StefanKarpinski picture StefanKarpinski  ·  3Comments

sbromberger picture sbromberger  ·  3Comments

helgee picture helgee  ·  3Comments