dotty 🚀 - implicit conversion is not implicitly available to types of UncheckedNull

I had a look at this and found several fundamental problems with the way we do explicit nulls.
Here's a minimization of the problem:

  val x: Array[String | UncheckedNull] = ???
  x.map(_.length)

Without -Yexplicit-nulls, the last line compiles to

  refArrayOps[String](x).map(_.length)

With -Yexplicit-nulls the compiler tries this instead:

  genericArrayOps[String | Null](x).map(_.length)

and then it fails since _.length is not applicable to a value of type String | Null.

refArrayOps and genericArrayOps are defined in Predef like this:

  @inline implicit def refArrayOps[T <: AnyRef](xs: Array[T]): ArrayOps[T]    = new ArrayOps(xs)
  @inline implicit def genericArrayOps[T](xs: Array[T]): ArrayOps[T]          = new ArrayOps(xs)

1st Problem: `Null` cannot be a value type

The first thing that is wrong here is that refArrayOps is no longer applicable to String | UncheckedNull since that type is not a reference type. The compiler is forced to choose genericArrayOps instead. Even if it would succeed doing so (we'll get to that part), the difference
would be unacceptable since generericArrayOps operations are about an order of magnitude slower than refArrayOps operations.

We do not have the option of changing Predef, or any of the other libraries that make use of the concept of a reference type (most of them unknown to us). The results can range from type errors, to unexpected slow-downs, to changed behavior. So, I conclude that we unfortunately need to revise our choice that Null is a value type. It has to be a reference type instead.

I believe the most conservative option is to make Null a subtype of Object. Yes, that means that we cannot talk about the type of Objects without Null, but I don't think this is a big loss. There's little you can do with an Object anyway. And when you cast it to another type, the null gets stripped away in any case.

2nd Problem: `UncheckedNull` cannot be an alias of `Null`.

The 2nd problem is illustrated by the fact that the compiler inferred String | Null instead of String | UncheckedNull as the type argument for genericArrayOps. Inferring type parameters is quite a complicated business and alias types are slippery. The compiler is free to dealias for normalization purposes or to try some branch. This is pretty fundamental. The compiler will make a best effort to not dealias needlessly, but that's all. Even if we could tweak it to not dealias UncheckedNull to Null in this case, the next, slightly more complicated case where it would fail will be right around the corner.

So, we need to find another representation of UncheckedNull that is less readily replaced by Null.
UncheckedNull still has to be at the same time a supertype and a subtype of Null. We need

UncheckedNull <: Null

since we want to be able to assign a Java String result to a String | Null. We need

Null <: UncheckedNull

since we want to be able to pass a String | Null to a Java String parameter.

We could try an annotated type for UncheckedNull instead. If UncheckedNull = Null @HostNull, and HostNull was a refining annotation, it would not get stripped. But it would also mean that UncheckedNull is a true subtype of Null. To get back Null <: UncheckedNull, we'd have to add special rules to TypeComparer. This certainly does not look trivial, but I see no other way.

A more direct approach would be to make UncheckedNull its own class-type, and make it a subclass of Null. Then again, we need special rules to ensure that we can still have Null <: UncheckedNull.

3rd Problem: We need better test coverage

That such a simple code snippet can upset so much shows that we are only at the beginning of testing. I believe at the very least we need to be able to compile all our codebase with -Yexplicit-nulls. The next step would be to extend that to the community build. This will be a lot of work. We need more people to sign up for this.

odersky on 22 Dec 2019

/cc @olhotak

odersky on 22 Dec 2019

Another question: What is the least upper bound of Null and UncheckedNull? If they are mutual subtypes of each other, this is not well-defined. It could be either type. But since the two types have different behavior, that's a problem.

So, maybe it's better to make UncheckedNull a true subtype of Null, and turn the missing subtype relationships into implicit conversions. We should also investigate this option.

odersky on 22 Dec 2019

When we started this project, we considered making UncheckedNull a distinct type from Null. This causes lots of problems in addition to the ones that @odersky mentions. If the two are distinct types, then Array[String|Null] and Array[String|UncheckedNull] also become distinct types. So the decision was that UncheckedNull would be Null as far as the type system is concerned, just with an annotation that would be stripped whenever the type system decides to strip it. This supports the initial motivating use case of chaining Java calls (s.trim.toUppercase), but there are many cases that it does not support. The two most common cases that we encountered are implicit conversions and generic types.

Since an UncheckedNull is unchecked by definition, it is expected to be converted at some point to a non-null type. That conversion is unsound. If such unsoundness is necessary for Java interop, we wanted it to be limited to the immediate boundary, to calls of Java-defined methods. Making UncheckedNull more magic than just Null with an annotation enables it to propagate away from the Java boundary, propagating the unsoundness throughout the code.

I would like to get rid of UncheckedNull altogether. The motivating use case s.trim.toUppercase is unsound. If we want the final code to be sound, eventually we either 1) have to add annotations to the Java methods or 2) rewrite it as s.trim.nn.toUppercase. The code is unsound, so eventually we need to make a fundamental choice, either accept unsoundness in Java interactions, or require modifications in code that interacts with Java. Code that interacts significantly with Java (expecially Scala wrappers for Java libraries) has hundreds of places where the Java code could return null, and we eventually have to make a choice between 1) accepting unsoundness or 2) going through the hundreds of places individually and deciding what the Scala code should do if a null shows up in each place. The migration to 2) is tedious, but it needs to be done if soundness is what we want.

UncheckedNull was intended to ease the migration to either annotated Java methods or s.trim.nn.toUppercase. To ease the migration, here is I think a better solution: within designated scopes (e.g. through a language import or something similar), we wrap every call to a Java-defined method with an unchecked .nn. For code that interacts heavily with Java, we can turn this on for easy migration. Then we can gradually go through the code and consider each call to a Java-defined method, turning off the insertion of unchecked .nn gradually as we do so.

On the 3rd problem, we _have_ ported the Dotty community build (except stdlib) to explicit nulls and are writing up the experience. That writeup and the diffs would inform this discussion, but at the moment, everyone is away for the holidays so the writeup won't be ready until the new year.

/cc @abeln @noti0na1

olhotak on 22 Dec 2019

👍4

Since an UncheckedNull is unchecked by definition, it is expected to be converted at some point to a non-null type. That conversion is unsound. If such unsoundness is necessary for Java interop, we wanted it to be limited to the immediate boundary, to calls of Java-defined methods. Making UncheckedNull more magic than just Null with an annotation enables it to propagate away from the Java boundary, propagating the unsoundness throughout the code.

That's a good observation. The example in the concrete issue is misleading here, since it does imply that the distance in x.map(_.length) is short where it is not. There's a lot going on before we can make this call to map! Still, it seems natural to want it to work. So, I am not sure about that aspect yet. If we keep UncheckedNull as an alias, we accept that it will be stripped away at points that look unpredictable and arbitrary to users. If we make it a true subtype of Null, we risk propagating unsoundness further than intended. Or like you say, maybe we should drop it altogether. I am not sure yet how that would work in practice.

odersky on 22 Dec 2019

One thing I noted.: If we keep UncheckedNull as an alias, we can make the map example work by adding an additional conversion for ArrayOps:

  refNullArrayOps[T <: AnyRef](xs: Array[T | UncheckedNull]): ArrayOps[T | UncheckedNull]

That will fix just a single class of uses, albeit an important one.

odersky on 22 Dec 2019

What if we reinterpret references to AnyRef coming from scala 2 as AnyRef | Null ? That should let Array[T | UncheckedNull] pass through refArrayOps.

smarter on 22 Dec 2019

There are two issues mixed together in this example. The return type of split is Array[String | UncheckedNull] | UncheckedNull.

It's the outside UncheckedNull that prevents the implicit conversion to ArrayOps. One way to deal with this is to reverse the earlier decision to not try to search for implicit conversions on nullable types. Another way is the implicit unchecked .nn on the return value of every Java-defined method invocation, which would make the type of the split call Array[String | UncheckedNull].

Even with both of these solutions, the inside UncheckedNull is still there, so the map is still on an ArrayOps[String | UncheckedNull] and the function passed to it needs to deal with null values in the array. But this problem is separate from the issue of the implicit conversion to ArrayOps.

olhotak on 22 Dec 2019

I think the "proper" way to fix the specific error reported here is to use Java annotations. With annotations, all the UncheckedNulls go away and everything works again. In order for this to work, we need two things:

the compiler needs access to a Java jdk with nullability annotations (in our prototype this was just a large text file indicating which jdk methods return non-nullable values: https://github.com/abeln/dotty/blob/dotty-explicit-nulls-evaluation/explicit-nulls-meta.txt)
we need to add support for recognizing Java-8 style annotations (which can appear anywhere within a type: e.g. Array[@NonNull String]). The code merged in Dotty only recognizes annotations at the "top level": e.g. @NonNull Array[String].

Regarding the more general question of "how unsound" UncheckedNull should be, I agree with @olhotak that, instead of adding additional cases for UncheckedNull, we should consider an "optimistic" mode where we assume all Java methods in a scope (be it via flag, import, or some other mechanism) return non-nullable values. This is supported by the results of our evaluation (see below).

Like Ondrej said, we just finished migrating most community build libraries to explicit nulls. The results can be found in the evaluation chapter of my master thesis (evaluation.pdf), but the gist of it is below:

We created multiple "run configurations", where certain explicit nulls features were enabled/disabled:

Screen Shot 2019-12-22 at 1 36 06 PM

For example, checker has access to nullability annotations for the jdk (but not all of it), which we got from the checker framework. optimistic has a mechanism similar to what Ondrej described, where we assume every Java method returns non-nullable values (but only at the top level, so split would return Array[String|UncheckedNull]).

We then ran dotty on every (library, configuration) pair, and counted how many compiler errors were produced. The results are reported below in # errors / 1000 LOC. The idea is to use "# of compilation errors" as a proxy for how much effort it would be to migrate a specific library (there are many caveats to that statement, but it's as good of a proxy as I can think of).

Screen Shot 2019-12-22 at 1 37 43 PM

By comparing the naive and java-null columns, we can estimate how "useful" UncheckedNull is (how many fewer errors are generated). The results show that UncheckedNull does not seem to be very useful in reducing the number of errors. By contrast, the optimistic mode does help quite a lot, so it seems like a desirable feature for Scala libraries that are heavy on Java interop.

abeln on 22 Dec 2019

👍2

There are two issues mixed together in this example. The return type of split is Array[String | UncheckedNull] | UncheckedNull.

It's the outside UncheckedNull that prevents the implicit conversion to ArrayOps. One way to deal with this is to reverse the earlier decision to not try to search for implicit conversions on nullable types. Another way is the implicit unchecked .nn on the return value of every Java-defined method invocation, which would make the type of the split call Array[String | UncheckedNull].

I think this would be an appropriate thing to do, at least for conversions invoked when trying to resolve selections. Without that, the following doesn't work either:

val x: String = ...
val y = x.substring(3).length // ok
val y = x.substring(3).stripSuffix(".txt") // doesn't compile

How can we possibly explain to a newcomer that the former works but the latter doesn't? stripSuffix is an extension method on String, but it looks like a normal method selection, and selections are supposed to "see through" UncheckedNulls.

This and the original example (including the performance aspect) if, when searching for extension methods on an x: T | UncheckedNull, we first used stripUncheckedNull to actually look for extension methods of T.

sjrd on 23 Dec 2019

👍1

Another way is the implicit unchecked .nn on the return value of every Java-defined method invocation, which would make the type of the split call Array[String | UncheckedNull].

I think we should try that. I.e.

Drop UncheckedNull entirely.
When typechecking a selection p.m: if the qualifier p has a Java-defined symbol and the type of p is of the form T | Null, rewrite to p.nn.m.

The scheme will also work for arrays. I.e if we see p.apply(i).m where p has a Java defined symbol returning an array, then we rewrite that to p.apply(i).nn.m. This can be achieved by tweaking the notion of "symbol" to skip array apply's and look in the prefix instead.

Does anyone have an argument why this cannot be enough? If not I think we should go ahead and try it out.

odersky on 21 Mar 2020

IIUC, with that scheme, the following compiles:

val len = str.substring(3).length

but the following doesn't:

val sub = str.substring(3)
val len = sub.length // sub is String | Null and is not a Java-defined symbol

sjrd on 21 Mar 2020

@sjrd Correct. The question is: Is that acceptable? Or do we need to refine the scheme?

A candidate refinement would be:

Do introduce UncheckedNull but now as a subtype of Null. This can be achieved with the existing compiler machinery by defining

type UncheckedNull = Null @uncheckedNull

and making uncheckedNull a refining annotation.

Java methods that are declared to return type T are assumed to return T | UncheckedNull instead. Nested types in return types are expanded to T | Null instead. For instance, if a Java method is declared to return an ArrayList<String>, this is translated to

ArrayList<String | Null> | UncheckedNull

If I call get on the result, I would get a value of type (String | Null) | UncheckedNull, which is the same as String | UncheckedNull.

In a selection p.m, if p is of type T | UncheckedNull, then .nn (or something equivalent) is inserted.

This scheme would handle your example, since the type of sub would be inferred to be String | UncheckedNull.

odersky on 21 Mar 2020

Compared to the current version of UncheckedNull, there would (at least) be two differences.

UncheckedNull is a proper subtype of Null, not an alias
Nested types in Java results would be unioned with Null instead of UncheckedNull

odersky on 21 Mar 2020

Nested types in Java results would be unioned with Null instead of UncheckedNull

This works if we get a parameterized type directly from Java. But if I call java.util.List.of(someJavaMethod()), then I'll end up wth a java.util.List<Foo | UncheckedNull> again, and now I can't pass it as a parameter to a Java method that expects a java.util.List<Foo | Null> because of invariance.

smarter on 21 Mar 2020

@smarter Good point. But maybe we can also always widen UncheckedNull to Null when instantiating a type variable? The viability of the scheme rests on the fact that UncheckedNull can only appear on the top-level, so we have to make sure that's the case.

odersky on 21 Mar 2020

But maybe we can also always widen UncheckedNull to Null when instantiating a type variable?

Seems tricky. If we do that in general, then we lose the UncheckedNull as soon as we wrap a Java value in identity (or more realistically, if we call .ensuring(...) on it).

smarter on 21 Mar 2020

Yes, but that is unavoidable and, I think, also acceptable.

odersky on 21 Mar 2020

If I call get on the result, I would get a value of type (String | Null) | UncheckedNull, which is the same as String | UncheckedNull.

This is unfortunately not correct: (String | Null) | UncheckedNull = String | Null

odersky on 3 Apr 2020

@noti0na1 proposed to not widen nested Java types with Null or UncheckedNull. I think that's worth trying out.

odersky on 3 Apr 2020

Do we consider it unacceptable to just give return values from Java methods a non-null type (at the outer level only)? So str.substring(3) would return String, not String|Null or String|UncheckedNull. Yes, that's unsound, but UncheckedNull is also unsound (though somewhat less so).

olhotak on 3 Apr 2020

That would be unacceptable because it definitely means that you can receive a value that does not belong to its declared type, and you don't get a chance to "fix" it before you use it.

sjrd on 3 Apr 2020

If you know that a particular Java method can return null, you can fix it with an ascription:
str.substring(3) : String|Null

olhotak on 3 Apr 2020

Even so, there is this tiny "time" frame in which the result of substring is typed as a String, which is plain invalid. An ascription, even right away, does not save the day. For example, an optimizer typically kills type ascription as soon as it sees them, because they're only making things worse for it. So an optimizer would rely on the result substring being non-null, and would dead-code eliminate further tests such as if (s == null) as false.

sjrd on 3 Apr 2020

I see.

In that case, how about the following rule. For a call o.m() if m is Java-defined:

If the prototype of o.m() is nullable, keep o.m() as it is.
Otherwise, change o.m() to o.m().nn

By default, Java-defined calls would get .nn added automatically, but you could prevent this with an ascription. .nn does a dynamic check, so its type is T|Null => T.

olhotak on 3 Apr 2020

Otherwise, change o.m() to o.m().nn

If I have some Scala code that does:

someJavaInstance.doSideEffects()

and doSideEffects happens to return null, then with that proposal, my code would suddenly start throwing NPEs at runtime.

smarter on 3 Apr 2020

Could we insert an intrinsic variant of .nn that does not do the dynamic check, but still has the (unsound) type T|Null => T ?

I'd like to explain the deeper reason guiding my suggestions. From our study of porting the community build programs, I drew the conclusion that in programs that use Java libraries, there are so many calls to Java code that trying to model Java soundly requires manual handling (e.g. casts) at many, many calls, even with something like UncheckedNull. Therefore I'm aiming for a design that allows two modes: 1) in code where you don't care much about soundness of Java calls, we don't bother trying to make them sound; this is the status quo before explicit nulls; 2) in other code you care enough about soundness that you're willing to do a lot of work to put in all the checks. (Switching between modes is open for discussion; it could be a compiler flag, language import, scoping construct, etc.) Rather than trying to design an in-between mode that is not sound but also still requires many casts, I would prefer to support both modes well.

olhotak on 3 Apr 2020

The problem of UncheckedNull being an alias of Null is that we may lose information (due to dealias) that some type is nullified by the compiler. By making the UncheckedNull a true subtype of Null, we can alwayse track the nullification.

The trade-off is we have to add the following special rules to TypeComparer:

A | Null <: A | UncheckedNull 
M[A | UncheckedNull] <: M[A | Null]
M[A | Null] <: M[A | UncheckedNull]

Another problem mentioned by @olhotak is if it is possible to create a space for users where we remove all UncheckedNull from the types. This is important for migration or any unsafe interaction with Java code. However, it is difficult to achieve:

We can't just disable nullification in certain scala files, as the Java class symbols are shared during compilation.
I was thinking about keeping two denot.info for one symbol: one is the regular type, one is the nullified type. But, it is hard to let the compiler decide which info to use.

noti0na1 on 20 Apr 2020

To summarize what we want to achieve (correct me if I am wrong):

Still nullify all Java definitions (don't change the current rules)
The nullified type should be equivalent to the type defined using | Null
Be able to track the nullification according to the type, and don't lose the information after any kind of normalization
Remove the half unsound UncheckedNull
Be able to create a space where Java definitions are considered not nullified (acceptable unsoundness)

I'd like to propose this scheme to replace UncheckedNull:

Define a tag @Unchecked which can be added to OrType

nullify(T) => T |@Unchecked Null
T |@Unchecked Null =:= T | Null
OrType with @Unchecked will not be widened

By importing or setting some flag, the nullified type and the regular type can be freely converted between each other.

For example:

import scala.language.RemoveUncheckedOrNull
// which enables the following implicit conversion

// defined in compiler
given removeUncheckedOrNull[A, B] as  Conversion[A, B]
  where A != B && A.stripAllUncheckedOrNull =:= B.stripAllUncheckedOrNull = 
  _.asInstanceOf[B]

I am still reading the source code of Implicits. I am not sure if this is possible.

noti0na1 on 20 Apr 2020

Dotty: implicit conversion is not implicitly available to types of UncheckedNull

minimized code

expectation

Most helpful comment

All 29 comments

1st Problem: `Null` cannot be a value type

2nd Problem: `UncheckedNull` cannot be an alias of `Null`.

3rd Problem: We need better test coverage

Related issues

Dotty: implicit conversion is not implicitly available to types of UncheckedNull

minimized code

expectation

Most helpful comment

All 29 comments

1st Problem: Null cannot be a value type

2nd Problem: UncheckedNull cannot be an alias of Null.

3rd Problem: We need better test coverage

Related issues

1st Problem: `Null` cannot be a value type

2nd Problem: `UncheckedNull` cannot be an alias of `Null`.