Dotty: implicit conversion is not implicitly available to types of UncheckedNull

Created on 21 Dec 2019  路  29Comments  路  Source: lampepfl/dotty

minimized code

/*
Using compiler flag -Yexplicit-nulls, the following snippet fails compilation with
[error] 4 |    "hello, world!".split(" ").map(_.length)
[error]   |    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[error]   |value map is not a member of Array[String | UncheckedN
ull] | UncheckedNull
*/

"hello, world!".split(" ").map(_.length)

expectation

Extension methods/implicit conversions should be avaliable to unions of UncheckedNull

nullability bug

Most helpful comment

When we started this project, we considered making UncheckedNull a distinct type from Null. This causes lots of problems in addition to the ones that @odersky mentions. If the two are distinct types, then Array[String|Null] and Array[String|UncheckedNull] also become distinct types. So the decision was that UncheckedNull would be Null as far as the type system is concerned, just with an annotation that would be stripped whenever the type system decides to strip it. This supports the initial motivating use case of chaining Java calls (s.trim.toUppercase), but there are many cases that it does not support. The two most common cases that we encountered are implicit conversions and generic types.

Since an UncheckedNull is unchecked by definition, it is expected to be converted at some point to a non-null type. That conversion is unsound. If such unsoundness is necessary for Java interop, we wanted it to be limited to the immediate boundary, to calls of Java-defined methods. Making UncheckedNull more magic than just Null with an annotation enables it to propagate away from the Java boundary, propagating the unsoundness throughout the code.

I would like to get rid of UncheckedNull altogether. The motivating use case s.trim.toUppercase is unsound. If we want the final code to be sound, eventually we either 1) have to add annotations to the Java methods or 2) rewrite it as s.trim.nn.toUppercase. The code is unsound, so eventually we need to make a fundamental choice, either accept unsoundness in Java interactions, or require modifications in code that interacts with Java. Code that interacts significantly with Java (expecially Scala wrappers for Java libraries) has hundreds of places where the Java code could return null, and we eventually have to make a choice between 1) accepting unsoundness or 2) going through the hundreds of places individually and deciding what the Scala code should do if a null shows up in each place. The migration to 2) is tedious, but it needs to be done if soundness is what we want.

UncheckedNull was intended to ease the migration to either annotated Java methods or s.trim.nn.toUppercase. To ease the migration, here is I think a better solution: within designated scopes (e.g. through a language import or something similar), we wrap every call to a Java-defined method with an unchecked .nn. For code that interacts heavily with Java, we can turn this on for easy migration. Then we can gradually go through the code and consider each call to a Java-defined method, turning off the insertion of unchecked .nn gradually as we do so.

On the 3rd problem, we _have_ ported the Dotty community build (except stdlib) to explicit nulls and are writing up the experience. That writeup and the diffs would inform this discussion, but at the moment, everyone is away for the holidays so the writeup won't be ready until the new year.

/cc @abeln @noti0na1

All 29 comments

I had a look at this and found several fundamental problems with the way we do explicit nulls.
Here's a minimization of the problem:

  val x: Array[String | UncheckedNull] = ???
  x.map(_.length)

Without -Yexplicit-nulls, the last line compiles to

  refArrayOps[String](x).map(_.length)

With -Yexplicit-nulls the compiler tries this instead:

  genericArrayOps[String | Null](x).map(_.length)

and then it fails since _.length is not applicable to a value of type String | Null.

refArrayOps and genericArrayOps are defined in Predef like this:

  @inline implicit def refArrayOps[T <: AnyRef](xs: Array[T]): ArrayOps[T]    = new ArrayOps(xs)
  @inline implicit def genericArrayOps[T](xs: Array[T]): ArrayOps[T]          = new ArrayOps(xs)

1st Problem: Null cannot be a value type

The first thing that is wrong here is that refArrayOps is no longer applicable to String | UncheckedNull since that type is not a reference type. The compiler is forced to choose genericArrayOps instead. Even if it would succeed doing so (we'll get to that part), the difference
would be unacceptable since generericArrayOps operations are about an order of magnitude slower than refArrayOps operations.

We do not have the option of changing Predef, or any of the other libraries that make use of the concept of a reference type (most of them unknown to us). The results can range from type errors, to unexpected slow-downs, to changed behavior. So, I conclude that we unfortunately need to revise our choice that Null is a value type. It has to be a reference type instead.

I believe the most conservative option is to make Null a subtype of Object. Yes, that means that we cannot talk about the type of Objects without Null, but I don't think this is a big loss. There's little you can do with an Object anyway. And when you cast it to another type, the null gets stripped away in any case.

2nd Problem: UncheckedNull cannot be an alias of Null.

The 2nd problem is illustrated by the fact that the compiler inferred String | Null instead of String | UncheckedNull as the type argument for genericArrayOps. Inferring type parameters is quite a complicated business and alias types are slippery. The compiler is free to dealias for normalization purposes or to try some branch. This is pretty fundamental. The compiler will make a best effort to not dealias needlessly, but that's all. Even if we could tweak it to not dealias UncheckedNull to Null in this case, the next, slightly more complicated case where it would fail will be right around the corner.

So, we need to find another representation of UncheckedNull that is less readily replaced by Null.
UncheckedNull still has to be at the same time a supertype and a subtype of Null. We need

UncheckedNull <: Null

since we want to be able to assign a Java String result to a String | Null. We need

Null <: UncheckedNull

since we want to be able to pass a String | Null to a Java String parameter.

We could try an annotated type for UncheckedNull instead. If UncheckedNull = Null @HostNull, and HostNull was a refining annotation, it would not get stripped. But it would also mean that UncheckedNull is a true subtype of Null. To get back Null <: UncheckedNull, we'd have to add special rules to TypeComparer. This certainly does not look trivial, but I see no other way.

A more direct approach would be to make UncheckedNull its own class-type, and make it a subclass of Null. Then again, we need special rules to ensure that we can still have Null <: UncheckedNull.

3rd Problem: We need better test coverage

That such a simple code snippet can upset so much shows that we are only at the beginning of testing. I believe at the very least we need to be able to compile all our codebase with -Yexplicit-nulls. The next step would be to extend that to the community build. This will be a lot of work. We need more people to sign up for this.

/cc @olhotak

Another question: What is the least upper bound of Null and UncheckedNull? If they are mutual subtypes of each other, this is not well-defined. It could be either type. But since the two types have different behavior, that's a problem.

So, maybe it's better to make UncheckedNull a true subtype of Null, and turn the missing subtype relationships into implicit conversions. We should also investigate this option.

When we started this project, we considered making UncheckedNull a distinct type from Null. This causes lots of problems in addition to the ones that @odersky mentions. If the two are distinct types, then Array[String|Null] and Array[String|UncheckedNull] also become distinct types. So the decision was that UncheckedNull would be Null as far as the type system is concerned, just with an annotation that would be stripped whenever the type system decides to strip it. This supports the initial motivating use case of chaining Java calls (s.trim.toUppercase), but there are many cases that it does not support. The two most common cases that we encountered are implicit conversions and generic types.

Since an UncheckedNull is unchecked by definition, it is expected to be converted at some point to a non-null type. That conversion is unsound. If such unsoundness is necessary for Java interop, we wanted it to be limited to the immediate boundary, to calls of Java-defined methods. Making UncheckedNull more magic than just Null with an annotation enables it to propagate away from the Java boundary, propagating the unsoundness throughout the code.

I would like to get rid of UncheckedNull altogether. The motivating use case s.trim.toUppercase is unsound. If we want the final code to be sound, eventually we either 1) have to add annotations to the Java methods or 2) rewrite it as s.trim.nn.toUppercase. The code is unsound, so eventually we need to make a fundamental choice, either accept unsoundness in Java interactions, or require modifications in code that interacts with Java. Code that interacts significantly with Java (expecially Scala wrappers for Java libraries) has hundreds of places where the Java code could return null, and we eventually have to make a choice between 1) accepting unsoundness or 2) going through the hundreds of places individually and deciding what the Scala code should do if a null shows up in each place. The migration to 2) is tedious, but it needs to be done if soundness is what we want.

UncheckedNull was intended to ease the migration to either annotated Java methods or s.trim.nn.toUppercase. To ease the migration, here is I think a better solution: within designated scopes (e.g. through a language import or something similar), we wrap every call to a Java-defined method with an unchecked .nn. For code that interacts heavily with Java, we can turn this on for easy migration. Then we can gradually go through the code and consider each call to a Java-defined method, turning off the insertion of unchecked .nn gradually as we do so.

On the 3rd problem, we _have_ ported the Dotty community build (except stdlib) to explicit nulls and are writing up the experience. That writeup and the diffs would inform this discussion, but at the moment, everyone is away for the holidays so the writeup won't be ready until the new year.

/cc @abeln @noti0na1

Since an UncheckedNull is unchecked by definition, it is expected to be converted at some point to a non-null type. That conversion is unsound. If such unsoundness is necessary for Java interop, we wanted it to be limited to the immediate boundary, to calls of Java-defined methods. Making UncheckedNull more magic than just Null with an annotation enables it to propagate away from the Java boundary, propagating the unsoundness throughout the code.

That's a good observation. The example in the concrete issue is misleading here, since it does imply that the distance in x.map(_.length) is short where it is not. There's a lot going on before we can make this call to map! Still, it seems natural to want it to work. So, I am not sure about that aspect yet. If we keep UncheckedNull as an alias, we accept that it will be stripped away at points that look unpredictable and arbitrary to users. If we make it a true subtype of Null, we risk propagating unsoundness further than intended. Or like you say, maybe we should drop it altogether. I am not sure yet how that would work in practice.

One thing I noted.: If we keep UncheckedNull as an alias, we can make the map example work by adding an additional conversion for ArrayOps:

  refNullArrayOps[T <: AnyRef](xs: Array[T | UncheckedNull]): ArrayOps[T | UncheckedNull]

That will fix just a single class of uses, albeit an important one.

What if we reinterpret references to AnyRef coming from scala 2 as AnyRef | Null ? That should let Array[T | UncheckedNull] pass through refArrayOps.

There are two issues mixed together in this example. The return type of split is Array[String | UncheckedNull] | UncheckedNull.

It's the outside UncheckedNull that prevents the implicit conversion to ArrayOps. One way to deal with this is to reverse the earlier decision to not try to search for implicit conversions on nullable types. Another way is the implicit unchecked .nn on the return value of every Java-defined method invocation, which would make the type of the split call Array[String | UncheckedNull].

Even with both of these solutions, the inside UncheckedNull is still there, so the map is still on an ArrayOps[String | UncheckedNull] and the function passed to it needs to deal with null values in the array. But this problem is separate from the issue of the implicit conversion to ArrayOps.

I think the "proper" way to fix the specific error reported here is to use Java annotations. With annotations, all the UncheckedNulls go away and everything works again. In order for this to work, we need two things:

  • the compiler needs access to a Java jdk with nullability annotations (in our prototype this was just a large text file indicating which jdk methods return non-nullable values: https://github.com/abeln/dotty/blob/dotty-explicit-nulls-evaluation/explicit-nulls-meta.txt)
  • we need to add support for recognizing Java-8 style annotations (which can appear anywhere within a type: e.g. Array[@NonNull String]). The code merged in Dotty only recognizes annotations at the "top level": e.g. @NonNull Array[String].

Regarding the more general question of "how unsound" UncheckedNull should be, I agree with @olhotak that, instead of adding additional cases for UncheckedNull, we should consider an "optimistic" mode where we assume all Java methods in a scope (be it via flag, import, or some other mechanism) return non-nullable values. This is supported by the results of our evaluation (see below).

Like Ondrej said, we just finished migrating most community build libraries to explicit nulls. The results can be found in the evaluation chapter of my master thesis (evaluation.pdf), but the gist of it is below:

We created multiple "run configurations", where certain explicit nulls features were enabled/disabled:

Screen Shot 2019-12-22 at 1 36 06 PM

For example, checker has access to nullability annotations for the jdk (but not all of it), which we got from the checker framework. optimistic has a mechanism similar to what Ondrej described, where we assume every Java method returns non-nullable values (but only at the top level, so split would return Array[String|UncheckedNull]).

We then ran dotty on every (library, configuration) pair, and counted how many compiler errors were produced. The results are reported below in # errors / 1000 LOC. The idea is to use "# of compilation errors" as a proxy for how much effort it would be to migrate a specific library (there are many caveats to that statement, but it's as good of a proxy as I can think of).

Screen Shot 2019-12-22 at 1 37 43 PM

By comparing the naive and java-null columns, we can estimate how "useful" UncheckedNull is (how many fewer errors are generated). The results show that UncheckedNull does not seem to be very useful in reducing the number of errors. By contrast, the optimistic mode does help quite a lot, so it seems like a desirable feature for Scala libraries that are heavy on Java interop.

There are two issues mixed together in this example. The return type of split is Array[String | UncheckedNull] | UncheckedNull.

It's the outside UncheckedNull that prevents the implicit conversion to ArrayOps. One way to deal with this is to reverse the earlier decision to not try to search for implicit conversions on nullable types. Another way is the implicit unchecked .nn on the return value of every Java-defined method invocation, which would make the type of the split call Array[String | UncheckedNull].

I think this would be an appropriate thing to do, at least for conversions invoked when trying to resolve selections. Without that, the following doesn't work either:

val x: String = ...
val y = x.substring(3).length // ok
val y = x.substring(3).stripSuffix(".txt") // doesn't compile

How can we possibly explain to a newcomer that the former works but the latter doesn't? stripSuffix is an extension method on String, but it looks like a normal method selection, and selections are supposed to "see through" UncheckedNulls.

This and the original example (including the performance aspect) if, when searching for extension methods on an x: T | UncheckedNull, we first used stripUncheckedNull to actually look for extension methods of T.

Another way is the implicit unchecked .nn on the return value of every Java-defined method invocation, which would make the type of the split call Array[String | UncheckedNull].

I think we should try that. I.e.

  • Drop UncheckedNull entirely.
  • When typechecking a selection p.m: if the qualifier p has a Java-defined symbol and the type of p is of the form T | Null, rewrite to p.nn.m.

The scheme will also work for arrays. I.e if we see p.apply(i).m where p has a Java defined symbol returning an array, then we rewrite that to p.apply(i).nn.m. This can be achieved by tweaking the notion of "symbol" to skip array apply's and look in the prefix instead.

Does anyone have an argument why this cannot be enough? If not I think we should go ahead and try it out.

IIUC, with that scheme, the following compiles:

val len = str.substring(3).length

but the following doesn't:

val sub = str.substring(3)
val len = sub.length // sub is String | Null and is not a Java-defined symbol

@sjrd Correct. The question is: Is that acceptable? Or do we need to refine the scheme?

A candidate refinement would be:

  • Do introduce UncheckedNull but now as a subtype of Null. This can be achieved with the existing compiler machinery by defining
type UncheckedNull = Null @uncheckedNull

and making uncheckedNull a refining annotation.

  • Java methods that are declared to return type T are assumed to return T | UncheckedNull instead. Nested types in return types are expanded to T | Null instead. For instance, if a Java method is declared to return an ArrayList<String>, this is translated to
ArrayList<String | Null> | UncheckedNull

If I call get on the result, I would get a value of type (String | Null) | UncheckedNull, which is the same as String | UncheckedNull.

  • In a selection p.m, if p is of type T | UncheckedNull, then .nn (or something equivalent) is inserted.

This scheme would handle your example, since the type of sub would be inferred to be String | UncheckedNull.

Compared to the current version of UncheckedNull, there would (at least) be two differences.

  • UncheckedNull is a proper subtype of Null, not an alias
  • Nested types in Java results would be unioned with Null instead of UncheckedNull

Nested types in Java results would be unioned with Null instead of UncheckedNull

This works if we get a parameterized type directly from Java. But if I call java.util.List.of(someJavaMethod()), then I'll end up wth a java.util.List<Foo | UncheckedNull> again, and now I can't pass it as a parameter to a Java method that expects a java.util.List<Foo | Null> because of invariance.

@smarter Good point. But maybe we can also always widen UncheckedNull to Null when instantiating a type variable? The viability of the scheme rests on the fact that UncheckedNull can only appear on the top-level, so we have to make sure that's the case.

But maybe we can also always widen UncheckedNull to Null when instantiating a type variable?

Seems tricky. If we do that in general, then we lose the UncheckedNull as soon as we wrap a Java value in identity (or more realistically, if we call .ensuring(...) on it).

Yes, but that is unavoidable and, I think, also acceptable.

If I call get on the result, I would get a value of type (String | Null) | UncheckedNull, which is the same as String | UncheckedNull.

This is unfortunately not correct: (String | Null) | UncheckedNull = String | Null

@noti0na1 proposed to not widen nested Java types with Null or UncheckedNull. I think that's worth trying out.

Do we consider it unacceptable to just give return values from Java methods a non-null type (at the outer level only)? So str.substring(3) would return String, not String|Null or String|UncheckedNull. Yes, that's unsound, but UncheckedNull is also unsound (though somewhat less so).

That would be unacceptable because it definitely means that you can receive a value that does not belong to its declared type, and you don't get a chance to "fix" it before you use it.

If you know that a particular Java method can return null, you can fix it with an ascription:
str.substring(3) : String|Null

Even so, there is this tiny "time" frame in which the result of substring is typed as a String, which is plain invalid. An ascription, even right away, does not save the day. For example, an optimizer typically kills type ascription as soon as it sees them, because they're only making things worse for it. So an optimizer would rely on the result substring being non-null, and would dead-code eliminate further tests such as if (s == null) as false.

I see.

In that case, how about the following rule. For a call o.m() if m is Java-defined:

  1. If the prototype of o.m() is nullable, keep o.m() as it is.
  2. Otherwise, change o.m() to o.m().nn

By default, Java-defined calls would get .nn added automatically, but you could prevent this with an ascription. .nn does a dynamic check, so its type is T|Null => T.

Otherwise, change o.m() to o.m().nn

If I have some Scala code that does:

someJavaInstance.doSideEffects()

and doSideEffects happens to return null, then with that proposal, my code would suddenly start throwing NPEs at runtime.

Could we insert an intrinsic variant of .nn that does not do the dynamic check, but still has the (unsound) type T|Null => T ?

I'd like to explain the deeper reason guiding my suggestions. From our study of porting the community build programs, I drew the conclusion that in programs that use Java libraries, there are so many calls to Java code that trying to model Java soundly requires manual handling (e.g. casts) at many, many calls, even with something like UncheckedNull. Therefore I'm aiming for a design that allows two modes: 1) in code where you don't care much about soundness of Java calls, we don't bother trying to make them sound; this is the status quo before explicit nulls; 2) in other code you care enough about soundness that you're willing to do a lot of work to put in all the checks. (Switching between modes is open for discussion; it could be a compiler flag, language import, scoping construct, etc.) Rather than trying to design an in-between mode that is not sound but also still requires many casts, I would prefer to support both modes well.

The problem of UncheckedNull being an alias of Null is that we may lose information (due to dealias) that some type is nullified by the compiler. By making the UncheckedNull a true subtype of Null, we can alwayse track the nullification.

The trade-off is we have to add the following special rules to TypeComparer:

A | Null <: A | UncheckedNull 
M[A | UncheckedNull] <: M[A | Null]
M[A | Null] <: M[A | UncheckedNull]

Another problem mentioned by @olhotak is if it is possible to create a space for users where we remove all UncheckedNull from the types. This is important for migration or any unsafe interaction with Java code. However, it is difficult to achieve:

  • We can't just disable nullification in certain scala files, as the Java class symbols are shared during compilation.

  • I was thinking about keeping two denot.info for one symbol: one is the regular type, one is the nullified type. But, it is hard to let the compiler decide which info to use.

To summarize what we want to achieve (correct me if I am wrong):

  • Still nullify all Java definitions (don't change the current rules)
  • The nullified type should be equivalent to the type defined using | Null
  • Be able to track the nullification according to the type, and don't lose the information after any kind of normalization
  • Remove the half unsound UncheckedNull
  • Be able to create a space where Java definitions are considered not nullified (acceptable unsoundness)

I'd like to propose this scheme to replace UncheckedNull:

  1. Define a tag @Unchecked which can be added to OrType
nullify(T) => T |@Unchecked Null
T |@Unchecked Null =:= T | Null
OrType with @Unchecked will not be widened
  1. By importing or setting some flag, the nullified type and the regular type can be freely converted between each other.

For example:

import scala.language.RemoveUncheckedOrNull
// which enables the following implicit conversion

// defined in compiler
given removeUncheckedOrNull[A, B] as  Conversion[A, B]
  where A != B && A.stripAllUncheckedOrNull =:= B.stripAllUncheckedOrNull = 
  _.asInstanceOf[B] 

I am still reading the source code of Implicits. I am not sure if this is possible.

Was this page helpful?
0 / 5 - 0 ratings