It is frequently more readable to give a boolean expression a name for self-documentation. For example:
final bool isSingleElement = fields == null;
if (!isSingleElement) {
fields.forEach(...);
}
However, this does not promote fields to non-null inside the if block. A workaround is to remove the variable:
if (fields != null) {
fields.forEach(...);
}
However, this makes the code less readable, because non-nullness of fields does not immediately imply that we're dealing with a "single line" mode of something.
Can the nullability inference be enhanced to infer using local booleans?
/cc @nturgut @leafpetersen
What is fields? A local variable? A field? Fields cannot be promoted.
Also, is this a bug with the migration tool (it should do something different)? Or with the null safety feature as implemented in the VM or something?
This isn't impossible, per se, but it's a pretty substantial lift. We'd need to track correlations between boolean valued variables and other pieces of our nullability state. I'm sympathetic to the request, but I don't think it's feasible in the time frame we have, and am not sure it would warrant the implementation complexity.
@leafpetersen did we decide to not do this? Then perhaps it should be turned into a potential, future language issue?
This is definitely not going to be in the feature as shipped. We could keep it as an option for future releases, but I'm pretty skeptical that this has the bang for the buck. This would require some pretty heavy machinery in the static analysis. I'll move it to the language repo for a round of discussion, but I think we may just want to close this out as unlikely to happen.
@lrhn @eernstg @stereotype441 @munificent @natebosch @jakemac53
Is this something we see potentially doing? Is there are a generalization that would increase the bang/buck, given that I expect the denominator there to be large?
The idea here is to abstract a test. That does make some sense. It's still a matter of "if this value is true, it implies something for some variables", we just don't branch on the boolean immediately.
I do worry about making the promotion inference rules too opaque.
A story of "If you do x != null or x == null in a branch, it will promote the local variable x on one of the branches" is simple and understandable.
We might be able to expand that to the implications being bound to boolean values stored in local variables, whether branched on immediately or not, we can still try to decide which code is dominated by the value being true vs false, even if some initial code is neither. It would even allow for something like:
var hasFoo = foo != null;
if (hasFoo) {
foo.start();
}
// more code
if (hasFoo) {
foo.end();
}
without needing to test twice. We can lose the information if anything assigns to either foo or hasFoo, so it's like hasFoo was "promoted" to boolean+variable information, and we can invalidate either.
As long as we can explain to users which tests promote which variables in a short and clear way, I'm for it.
I don't want to leave users guessing at whether one particular pattern is understood or not. It's too fragile if small changes to code structure makes promotion stop working, even though they are obviously innocuous.
A fractal boundary is impossible to internalize, a straight line is not, even though it might cut off some correct implications.
To drill down a little further on the bang/buck question, I think the key implementation difficulty is that flow analysis models are snapshots of the state of the function at points in time; if we record a state at one time and then try to use it later, we need to somehow "rebase" that state to account for other promotions and demotions that have happened since we took the snapshot. Consider:
int? getPossiblyNullValue() => ...;
void f(int? x, int? y, int? z) {
if (y == null) return;
bool b = x == null; // (1)
if (z == null) return;
y = getPossiblyNullValue(); // (2)
if (b) return; // (3)
}
At the time of (1), y has been promoted to non-nullable, but z remains nullable. So the false state we would store for b would be {x is int, y is int, z is int?, b is bool}. After (2), y has been demoted and z has been promoted so the state is {x is int?, y is int?, z is int, b is bool}. Then, at (3), we need to take the former state and somehow rebase it to account for the state changes that have happened to y and z. There's not currently enough information in our data structures to do that.
I think we could fix the problem by doing some SSA-like bookkeeping: associate each variable with a unique integer representing its "SSA node id". Whenever SSA analysis would need to allocate a fresh node (either because of an assignment or because of a phi node at a join point), we allocate a fresh id. That way the rebase algorithm can distinguish demotions from promotions by seeing that the ids are different. The algorithm would be: if the ids match, we take the most promoted type; if they differ, we take the more recent information. So considering the example above again, we would be rebasing {x₀ is int, y₁ is int, z₂ is int?, b₃ is bool} on top of {x₀ is int?, y₄ is int?, z₂ is int, b₃ is bool}, and that would produce {x₀ is int, y₄ is int?, z₂ is int, b₃ is bool}, which is what we want.
Another implementation difficulty is: at (3), when we recall the state we stored at (1), how do we make sure it hasn't been invalidated by intervening assignments to b or x? The rebase algorithm takes care of detecting intervening assignments to x; how about intervening assignments to b? We could address this by organizing our storage as a map from SSA node ids to true/false states. That way if b gets assigned, it will have a new SSA node id so we will ignore any true/false states we'd previously associated with it.
So that's the cost ("buck") part of the equation. What are the potential benefits ("bangs")? I can think of several:
== comparison and update it to account for any promotions/demotions that happened during the RHS of the == comparison so that we can compare the two states and figure out if a promotion is in order. The rebase algorithm handles that nicely.!s, for example:foo(int? x);
int y;
if (x != null) {
y = x + 1;
// Flow analysis state is implicitly saved here, associated with `x != null`
}
...other logic...
if (x != null) { // SSA ids let us see that this is the same condition as above
// `x != null` state is restored here (and rebased over any promotions/demotions
// that have occurred in the meantime)
print(y + 1); // Ok; we know that `y` is definitely assigned.
}
}
I'm not entirely sure whether the benefits justify the costs, but it sounds like a lot of fun to work on. I will try to set aside a few days after the Beta release ships to see if I can make these ideas pan out.
One further thought. Extending my idea from yesterday, if we changed flow analysis so that it tracked promoted types via their SSA node id, then we could make promotion understand about reassigned variables, e.g.:
void foo(int? x) {
var y = x;
if (x != null) {
print(y + 1); // Ok because y and x refer to the same value; and x != null
}
}
Not a super compelling use case, but it would be a little extra "bang" that could potentially be unlocked if we wanted to go this direction.
Most helpful comment
To drill down a little further on the bang/buck question, I think the key implementation difficulty is that flow analysis models are snapshots of the state of the function at points in time; if we record a state at one time and then try to use it later, we need to somehow "rebase" that state to account for other promotions and demotions that have happened since we took the snapshot. Consider:
At the time of (1),
yhas been promoted to non-nullable, butzremains nullable. So thefalsestate we would store forbwould be{x is int, y is int, z is int?, b is bool}. After (2),yhas been demoted andzhas been promoted so the state is{x is int?, y is int?, z is int, b is bool}. Then, at (3), we need to take the former state and somehow rebase it to account for the state changes that have happened toyandz. There's not currently enough information in our data structures to do that.I think we could fix the problem by doing some SSA-like bookkeeping: associate each variable with a unique integer representing its "SSA node id". Whenever SSA analysis would need to allocate a fresh node (either because of an assignment or because of a phi node at a join point), we allocate a fresh id. That way the rebase algorithm can distinguish demotions from promotions by seeing that the ids are different. The algorithm would be: if the ids match, we take the most promoted type; if they differ, we take the more recent information. So considering the example above again, we would be rebasing
{x₀ is int, y₁ is int, z₂ is int?, b₃ is bool}on top of{x₀ is int?, y₄ is int?, z₂ is int, b₃ is bool}, and that would produce{x₀ is int, y₄ is int?, z₂ is int, b₃ is bool}, which is what we want.Another implementation difficulty is: at (3), when we recall the state we stored at (1), how do we make sure it hasn't been invalidated by intervening assignments to
borx? The rebase algorithm takes care of detecting intervening assignments tox; how about intervening assignments tob? We could address this by organizing our storage as a map from SSA node ids to true/false states. That way ifbgets assigned, it will have a new SSA node id so we will ignore any true/false states we'd previously associated with it.So that's the cost ("buck") part of the equation. What are the potential benefits ("bangs")? I can think of several:
==comparison and update it to account for any promotions/demotions that happened during the RHS of the==comparison so that we can compare the two states and figure out if a promotion is in order. The rebase algorithm handles that nicely.!s, for example:I'm not entirely sure whether the benefits justify the costs, but it sounds like a lot of fun to work on. I will try to set aside a few days after the Beta release ships to see if I can make these ideas pan out.