Language: Infer non-nullability from local boolean variables

Created on 24 Jun 2020 · 9Comments · Source: dart-lang/language

It is frequently more readable to give a boolean expression a name for self-documentation. For example:

final bool isSingleElement = fields == null;
if (!isSingleElement) {
  fields.forEach(...);
}

However, this does not promote fields to non-null inside the if block. A workaround is to remove the variable:

if (fields != null) {
  fields.forEach(...);
}

However, this makes the code less readable, because non-nullness of fields does not immediately imply that we're dealing with a "single line" mode of something.

Can the nullability inference be enhanced to infer using local booleans?

/cc @nturgut @leafpetersen

nnbd

Source

yjbanov

👍2

Most helpful comment

To drill down a little further on the bang/buck question, I think the key implementation difficulty is that flow analysis models are snapshots of the state of the function at points in time; if we record a state at one time and then try to use it later, we need to somehow "rebase" that state to account for other promotions and demotions that have happened since we took the snapshot. Consider:

int? getPossiblyNullValue() => ...;
void f(int? x, int? y, int? z) {
  if (y == null) return;
  bool b = x == null; // (1)
  if (z == null) return;
  y = getPossiblyNullValue(); // (2)
  if (b) return; // (3)
}

At the time of (1), y has been promoted to non-nullable, but z remains nullable. So the false state we would store for b would be {x is int, y is int, z is int?, b is bool}. After (2), y has been demoted and z has been promoted so the state is {x is int?, y is int?, z is int, b is bool}. Then, at (3), we need to take the former state and somehow rebase it to account for the state changes that have happened to y and z. There's not currently enough information in our data structures to do that.

I think we could fix the problem by doing some SSA-like bookkeeping: associate each variable with a unique integer representing its "SSA node id". Whenever SSA analysis would need to allocate a fresh node (either because of an assignment or because of a phi node at a join point), we allocate a fresh id. That way the rebase algorithm can distinguish demotions from promotions by seeing that the ids are different. The algorithm would be: if the ids match, we take the most promoted type; if they differ, we take the more recent information. So considering the example above again, we would be rebasing {x₀ is int, y₁ is int, z₂ is int?, b₃ is bool} on top of {x₀ is int?, y₄ is int?, z₂ is int, b₃ is bool}, and that would produce {x₀ is int, y₄ is int?, z₂ is int, b₃ is bool}, which is what we want.

Another implementation difficulty is: at (3), when we recall the state we stored at (1), how do we make sure it hasn't been invalidated by intervening assignments to b or x? The rebase algorithm takes care of detecting intervening assignments to x; how about intervening assignments to b? We could address this by organizing our storage as a map from SSA node ids to true/false states. That way if b gets assigned, it will have a new SSA node id so we will ignore any true/false states we'd previously associated with it.

So that's the cost ("buck") part of the equation. What are the potential benefits ("bangs")? I can think of several:

This new "rebase" algorithm has the same functionality as the existing "restrict" algorithm, but with more intuitive bookkeeping. Replacing "restrict" with "rebase" wouldn't have any user visible effect but it would increase my confidence in flow analysis's soundness.
A lot of the complexity in flow analysis has to do with deciding when demotion should occur. I believe SSA node ids would simplify a lot of this logic too; again, no user visible effect but it would increase confidence in the algorithm.
The rebase algorithm should neatly address the main source of implementation difficulty for https://github.com/dart-lang/language/issues/1224. The key problem in that bug is very similar to this one: we have to take a flow analysis state from the LHS of an == comparison and update it to account for any promotions/demotions that happened during the RHS of the == comparison so that we can compare the two states and figure out if a promotion is in order. The rebase algorithm handles that nicely.
If we combined this with some simple common subexpression detection, we could potentially use these techniques to "restore" an old flow analysis state when a test is repeated. That could address a lot of scenarios where people currently have to insert nuisance !s, for example:

foo(int? x);
  int y;
  if (x != null) {
    y = x + 1;
    // Flow analysis state is implicitly saved here, associated with `x != null`
  }
  ...other logic...
  if (x != null) { // SSA ids let us see that this is the same condition as above
    // `x != null` state is restored here (and rebased over any promotions/demotions
    // that have occurred in the meantime)
    print(y + 1); // Ok; we know that `y` is definitely assigned.
  }
}

I'm not entirely sure whether the benefits justify the costs, but it sounds like a lot of fun to work on. I will try to set aside a few days after the Beta release ships to see if I can make these ideas pan out.

stereotype441 on 30 Oct 2020

👍2

All 9 comments

What is fields? A local variable? A field? Fields cannot be promoted.

srawlins on 25 Jun 2020

Also, is this a bug with the migration tool (it should do something different)? Or with the null safety feature as implemented in the VM or something?

srawlins on 25 Jun 2020

This isn't impossible, per se, but it's a pretty substantial lift. We'd need to track correlations between boolean valued variables and other pieces of our nullability state. I'm sympathetic to the request, but I don't think it's feasible in the time frame we have, and am not sure it would warrant the implementation complexity.

leafpetersen on 26 Jun 2020

@leafpetersen did we decide to not do this? Then perhaps it should be turned into a potential, future language issue?

mit-mit on 29 Oct 2020

This is definitely not going to be in the feature as shipped. We could keep it as an option for future releases, but I'm pretty skeptical that this has the bang for the buck. This would require some pretty heavy machinery in the static analysis. I'll move it to the language repo for a round of discussion, but I think we may just want to close this out as unlikely to happen.

leafpetersen on 29 Oct 2020

@lrhn @eernstg @stereotype441 @munificent @natebosch @jakemac53

Is this something we see potentially doing? Is there are a generalization that would increase the bang/buck, given that I expect the denominator there to be large?

leafpetersen on 29 Oct 2020

The idea here is to abstract a test. That does make some sense. It's still a matter of "if this value is true, it implies something for some variables", we just don't branch on the boolean immediately.

I do worry about making the promotion inference rules too opaque.
A story of "If you do x != null or x == null in a branch, it will promote the local variable x on one of the branches" is simple and understandable.

We might be able to expand that to the implications being bound to boolean values stored in local variables, whether branched on immediately or not, we can still try to decide which code is dominated by the value being true vs false, even if some initial code is neither. It would even allow for something like:

var hasFoo = foo != null;
if (hasFoo) {
  foo.start();
}
// more code
if (hasFoo) {
  foo.end();
}

without needing to test twice. We can lose the information if anything assigns to either foo or hasFoo, so it's like hasFoo was "promoted" to boolean+variable information, and we can invalidate either.

As long as we can explain to users which tests promote which variables in a short and clear way, I'm for it.
I don't want to leave users guessing at whether one particular pattern is understood or not. It's too fragile if small changes to code structure makes promotion stop working, even though they are obviously innocuous.
A fractal boundary is impossible to internalize, a straight line is not, even though it might cut off some correct implications.

lrhn on 30 Oct 2020

int? getPossiblyNullValue() => ...;
void f(int? x, int? y, int? z) {
  if (y == null) return;
  bool b = x == null; // (1)
  if (z == null) return;
  y = getPossiblyNullValue(); // (2)
  if (b) return; // (3)
}

So that's the cost ("buck") part of the equation. What are the potential benefits ("bangs")? I can think of several:

This new "rebase" algorithm has the same functionality as the existing "restrict" algorithm, but with more intuitive bookkeeping. Replacing "restrict" with "rebase" wouldn't have any user visible effect but it would increase my confidence in flow analysis's soundness.
A lot of the complexity in flow analysis has to do with deciding when demotion should occur. I believe SSA node ids would simplify a lot of this logic too; again, no user visible effect but it would increase confidence in the algorithm.
The rebase algorithm should neatly address the main source of implementation difficulty for https://github.com/dart-lang/language/issues/1224. The key problem in that bug is very similar to this one: we have to take a flow analysis state from the LHS of an == comparison and update it to account for any promotions/demotions that happened during the RHS of the == comparison so that we can compare the two states and figure out if a promotion is in order. The rebase algorithm handles that nicely.
If we combined this with some simple common subexpression detection, we could potentially use these techniques to "restore" an old flow analysis state when a test is repeated. That could address a lot of scenarios where people currently have to insert nuisance !s, for example:

foo(int? x);
  int y;
  if (x != null) {
    y = x + 1;
    // Flow analysis state is implicitly saved here, associated with `x != null`
  }
  ...other logic...
  if (x != null) { // SSA ids let us see that this is the same condition as above
    // `x != null` state is restored here (and rebased over any promotions/demotions
    // that have occurred in the meantime)
    print(y + 1); // Ok; we know that `y` is definitely assigned.
  }
}

stereotype441 on 30 Oct 2020

👍2

One further thought. Extending my idea from yesterday, if we changed flow analysis so that it tracked promoted types via their SSA node id, then we could make promotion understand about reassigned variables, e.g.:

void foo(int? x) {
  var y = x;
  if (x != null) {
    print(y + 1); // Ok because y and x refer to the same value; and x != null
  }
}

Not a super compelling use case, but it would be a little extra "bang" that could potentially be unlocked if we wanted to go this direction.

stereotype441 on 31 Oct 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Specialize interfaces for specific generic types on a class

natebosch · 4Comments

Dart doesn't allow final class

dev-aentgs · 3Comments

Implement "Spread Collections"

munificent · 5Comments

question: Method returning a generic type

moneer-muntazah · 3Comments

Why are late instance fields not lazily initialized in constructors?

creativecreatorormaybenot · 3Comments