Some rough notes from a conversation @ahejlsberg and I had earlier about trade-offs in the control flow analysis work based on running the real-world code (RWC) tests. For obvious reasons I'll be comparing what Flow does with similar examples to compare and contrast possible outcomes.
The primary question is: When a function is invoked, what should we assume its side effects are?
One option is to be _pessimistic_ and reset all narrowings, assuming that any function might mutate any object it could possibly get its hands on. Another option is to be _optimistic_ and assume the function doesn't modify any state. Both of these seem to be bad.
This problem spans both locals (which might be subject to some "closed over or not" analysis) and object fields.
The TypeScript compiler has code like this:
enum Token { Alpha, Beta, Gamma }
let token = Token.Alpha;
function nextToken() {
token = Token.Beta;
}
function maybeNextToken() {
if (... something ...) {
nextToken();
}
}
function doSomething() {
if (token !== Token.Alpha) {
maybeNextToken();
}
// is this possible?
if (token === Token.Alpha) {
// something happens
}
}
Optimistically assuming token
isn't modified by maybeNextToken
incorrectly flags token === Token.Alpha
as an impossibility. However, in other cases, this is a _good_ check to do! See later examples.
The RWC suite picked up a "bug" that looked like this:
// Function somewhere else
declare function tryDoSomething(x: string, result: { success: boolean; value: number; }): void;
function myFunc(x: string) {
let result = { success: false, value: 0 };
tryDoSomething(x, result);
if (result.success === true) { // %%
return result.value;
}
tryDoSomething(x.trim(), result);
if (result.success === true) { // ??
return result.value;
}
return -1;
}
The ??
line here is _not_ a bug in the user code, but we thought it was, because after the %%
block runs, the only remaining value in result.success
's domain is false
.
We found actual bugs (several!) in partner code that looked like this:
enum Kind { Good, Bad, Ugly }
let kind: Kind = ...;
function f() {
if (kind) {
log('Doing some work');
switch (kind) {
case Kind.Good:
// unreachable!
}
}
}
Here, we detected the bug that Kind.Good
(which has the falsy value 0
) is not in the domain of kind
at the point of the case
label. However, if we were fully pessimistic, we couldn't know that the global function log
doesn't modify the global variable kind
, thus incorrectly allowing this broken code.
A question on flowtype SO is a good example of this
A smaller example that demonstrates the behavior:
function fn(arg: { x: string | null }) {
if (arg.x !== null) {
alert('All is OK!');
// Flow: Not OK, arg.x could be null
console.log(arg.x.substr(3));
}
}
The problem here is that, pessimistically, something like this might be happening:
let a = { x: 'ok' };
function alert() {
a.x = null;
}
fn(a);
The TS compiler has code that looks like this (simplified):
function visitChildren(node: Node, visit: (node: Node) => void) {
switch(node.kind) {
case SyntaxKind.BinaryExpression:
visit(node.left);
visit(node.right); // Unsafe?
break;
case SyntaxKind.PropertyAccessExpression:
visit(node.expr);
visit(node.name); // Unsafe?
break;
}
}
Here, we discriminated the Node
union type by its kind
. A pessimistic behavior would say that the second invocations are unsafe, because the call to visit
may have mutated node.kind
through a secondary reference and invalidated the discrimination.
Flow does some assignment analysis to improve the quality of these errors, but it's obviously short of a full inlining solution, which wouldn't be even remotely practical. Some examples of how to defeat the analysis:
// Non-null assignment can still trigger null warnings
function fn(x: string | null) {
function check1() {
x = 'still OK';
}
if (x !== null) {
check1();
// Flow: Error, x could be null
console.log(x.substr(0));
}
}
// Inlining is only one level deep
function fn(x: string | null) {
function check1() {
check2();
}
function check2() {
x = null;
}
if (x !== null) {
check1();
// Flow: No error
console.log(x.substr(0)); // crashes
}
}
const
parametersA low-hanging piece of fruit is to allow a const
modifier on parameters. This would allow a much faster fix for code that looks like this:
function fn(const x: string | number) {
if (typeof x === 'string') {
thisFunctionCannotMutateX();
x.substr(0); // ok
}
}
readonly
fieldsThe visitChildren
example above might be mitigated by saying that readonly
fields retain their narrowing effects even in the presence of intervening function calls. This is technically unsound as you may have both a readonly
and non-readonly
alias to the same property, but in practice this is probably very rare.
Random ideas that got thrown out (will add to this list) but are probably bad?
pure
modifier on functions that says this function doesn't modify anything. This is a bit impractical as we'd realistically want this on the vast majority of all functions, and it doesn't really solve the problem since lots of functions only modify _one_ thing so you'd really want to say "pure
except for m
"volatile
property modifier that says this "this property will change without notice". We're not C++ and it's perhaps unclear where you'd apply this and where you wouldn't.pure is a bit impractical
with persistent data structures it makes perfect practical sense, amortized time/memory consumption for a typical application would be comparable, it take's some discipline and support from the language ( https://github.com/Microsoft/TypeScript/issues/6230, https://github.com/Microsoft/TypeScript/issues/2319, https://github.com/Microsoft/TypeScript/issues/1213), but it's worth the trouble
with readonly
for immutables ans pure
for functions it will be blazing fast compared to inlining, because modifiers only need to be checked once for each function/interface
please no more half measures based on assumptions for a couple of typical cases and endless exceptions that only worsen soundness
Looking through the RWC tests, I see only one case where the additional narrowing performed by #9407 causes unwanted errors. However, there were scores of real bugs and inconsistencies being caught, such as checking for the same value twice, assuming that enum members with the value 0 pass truthiness checks, dead branches in if
and switch
statements, etc. In aggregate, I think our optimistic assumption that type guards are unaffected by intervening function calls is the best compromise.
BTW, the compiler itself relies on side effecting changes to a token
variable in the parser. For example:
if (token === SyntaxKind.ExportKeyword) {
nextToken();
if (token === SyntaxKind.DefaultKeyword) {
// We have "export default"
}
...
}
This becomes an error with #9407 because the compiler continues to think token
has the value SyntaxKind.ExportKeyword
following the call to nextToken
(and thus reports an error when token
is compared to SyntaxKind.DefaultKeyword
).
We will instead be using a function to obtain the current token:
if (token() === SyntaxKind.ExportKeyword) {
nextToken();
if (token() === SyntaxKind.DefaultKeyword) {
// We have "export default"
}
...
}
where the function is simply:
function token(): SyntaxKind {
return currentToken;
}
Since all modern JavaScript VMs inline such simple functions there is no performance penalty.
I think this pattern of suppressing type narrowing by accessing mutable state using a function is a reasonable one.
is there any possibility of a const on an argument?
function fn(const x: string | number)
In swift/objc they added the "in", "out", and "inout" meta attributes to arguments so you could see if they meant for a value to be modified. i feel like this could be useful for annotating whether a function will modify a variable it received or not. Not sure if I'm missing something.
While pure
is all the "rage" these days, I think many JavaScript developers don't understand what it means and could easily lead to surprises and or large scale frustration. Because the inherit universal mutability of JavaScript I think pure
would generally be shackles that would potentially go unused because of the over strictness of the concept... It just seems an "all or nothing" concept, when most of JavaScript is shades of grey.
I think a argument by argument modifier makes the most sense. Of course const
and readonly
only imply disallowing reassignment. Personally, I don't find it confusing of expanding those to inform the flow control that property mutations are "unsafe" as that would be a design time only construct.
Is there any benefit in considering a deeply immutable design time keyword to avoid any confusion between const
and readonly
? Potentially even immutable
(though that is jargoney).
in my stubborn defense of pure
:
[1, 2, 3].map(x => x + 1 /* <-- hi, i am pure function */)
i really wish it was it was considered
other options may work out too and should be considered, just not a big fan of half-baked solutions
I understand your argument, although I think what you would cover with "obviously" pure functions are not ones that suffer from CFA challenges. You example of [1, 2, 3].map(x => x + 1)
does not suffer from CFA function boundary issues, and while it is logically pure, there is not benefit in denoting this at design time.
I think your argument might have more merit if you could think of some examples where there are current CFA issues that would be solved by a pure notation that benefitted the developer. My argument isn't against the concept of writing pure functions, it is more of the argument that an "all or nothing" solution likely won't be practical.
One of the biggest current issues with CFA in my opinion is in Promise callbacks. There are many situations where those might not be pure. That is why I think an argument by argument mutability notation would solve significantly more use cases and do so in a more end-developer controllable way.
although you are right that the example doesn't target CFA issues, the answer is rather evident:
this is one merit of the pureness, among many others
let me say it again, if a promise callback is pure, all reasoning that was done outside of it's scope or time is still valid inside of it
you might wonder why i am so certain, let me explain, there is no notion of time in math that stands behind the idea of pureness, time doesn't exist there, it's absolutely static, everything existed forever and forever will and cannot ever change (because changes need time) to the point where a pure function can be thrown away and replaced with its result (what's called referential transparency) which for the any given arguments will always be the same (hence a question why not just use a hash-table instead to map arguments to results), so you don't really need a function in the first place, armed with such super static equivalence between results that were calculated vs hardcoded the only merit in using a function is to save some memory that otherwise would be taken by a huge static table mapping all possible arguments to all results, since math is absolute abstraction we only care when it comes to program it
this is what a pure function is, this is why it will always work
i agree that it's rather 'all or nothing', but is the price you have to pay for the peace of mind because in its strictness it solves the CFA problems and many others simply by giving them no place to exist
think about it, a "pure" (aka persistent) data structure, say, a linked list, can be deeply copied just by... being assigned:
const one = toList();
const absolutelyAnotherListThatWeCanDoWhateverWeWant = one;
how cool is that?
in my humble opinion "pure" functions and data structures can easily cover 40% of typical everyday needs without even feeling restricted
as for the rest 60% (since we don't appreciate any sort of extremes) we can use good old mutable impure JavaScript
but mind those 40% percent of no problem ever! that's a darn bargain, just let it happen and enjoy it
totally agree with @aleksey-bykov arguments in favor of having "pure" introduced into the type checker.
Just my .02, I think "pure" would be too abstract and too confusing to understand.
I'm actually in favor of apple style semantics, "in", "out", "inout"
Also I think const could lead to confusion since they would be similar in usage but different concepts.
It might be easier for beginners to pick up and understand but the intent simply wouldn't be as clear.
@RyanCavanaugh The compiler should already be able to answer the question in the example of _Optimistic: Bad behavior on locals_ already with yes. If doSomething
is called when token
is Token.Alpha
, the condition token !== Token.Alpha
of the first if
statement is false. Hence, the body is skipped and nextToken
is never called. token
is Token.Alpha
still holds when the second if
statement is reached. Ergo the answer is yes.
We can make this question insteresting again by changing doSomething
like this:
function doSomething() {
if (token !== Token.Alpha) {
maybeNextToken();
// is this possible?
if (token === Token.Alpha) {
// something happens
}
}
}
Now it is only possible if maybeNextToken
changes token
to Token.Alpha
. If you are optimistic you would answer the question with no, since you assume that functions have no side effects like changing variables. However, if you change token
in nextToken
to Token.Alpha
instead of Token.Beta
function nextToken() {
token = Token.Alpha;
}
it would be possible if and only if ... something ...
is true in maybeNextToken
. If you are optimistic you would still answer the question with no even though the correct answer is yes, maybe.
How about tracking variable changes of functions?
The basic idea is that we know a constraint before a function call and that we can reason about a constraint after a function call.
Note: I write a new introduced term bold in the following when it is mentioned the first time. Jump back to this position to see the definition of this term.
if
statementsThis idea is quite similar to control flow base type analysis. So let's take a look at the if
statement in the example of control flow base type analysis first (before we discover constraints introduced by functions):
function foo(x: string | number | boolean) {
if (typeof x === "string") {
x; // type of x is string here
x = 1;
x; // type of x is number here
}
x; // type of x is number | boolean here
}
I rewrite the constraint annotations like this
function foo(x: string | number | boolean) {
// x :: string | number | boolean
if (typeof x === "string") {
// x :: string
x = 1;
// x :: number
}
// x :: number | boolean
}
As usual, a single colon like in x: string | number | boolean
means variable x
has type string | number | boolean
.
A double colon like in x :: string | number | boolean
describes a value-typed-binding, which means that the current value of x
is of type string | number | boolean
. The type of the value may be any subtype of the type of the variable. You can for example conclude from the check typeof x === "string"
is true
that we have x :: string
(the current value of x
is a string
).
The constraints of a simple if
statement can be described in more general as
// cBefore
if (condition) {
// c1
...
// c2
}
// cAfter
Here constraint is abbreviated as c (cBefore
stands for _constraint before_). I name our constraints in the example above like this:
function foo(x: string | number | boolean) {
// cBefore := (x :: string | number | boolean)
if (typeof x === "string") {
// c1 := (x :: string)
x = 1;
// c2 := (x :: number)
}
// cAfter := (x :: number | boolean)
}
I write c := e
to assign a constraint expression e
to the constraint called c
.
In the example, x = 1;
is a side effect that causes a change on the constraint we have before. Before x = 1;
we have c1 := (x :: string)
and afterwards we have c2 := (x :: number)
.
I write c with c'
for the constraint where all appearances of value-type-bindings x :: T
of c
are overridden with value-type-bindings x :: T'
of c'
. In this example c1 with c2
is (x :: string) with (x :: number)
which results in (x :: number)
, since the old constraint value-type-binding (x :: string)
in c1
is overridden by (x :: number)
of t1
.
I describe a side effect s
as a constraint transition c -> c'
. A constraint transition takes constraint c
(that holds before side effect s
) and gives you a new constraint c'
(that holds after side effect s
). Given a constraint transition t: c -> e
where e
is a constraint expression, I use function like notation t(cp)
to express constraint c'
that is equal to e
where all appearances of c
are replaced by cp
.
Using this notation, the example from above looks like this:
// c1 := (x :: string)
x = 1; // t1: c -> c with (x :: number)
// c2 := t1(c1)
You can describe reasoning about constraints of a simple if
statement in more general like this
// cBefore
if (condition) {
// c1 := cBefore & condition
... // t1
// c2 := t1(c1)
// = t1(cBefore & condition)
}
// cAfter := t1(cBefore & condition) | (cBefore & !condition)
I write c & condition
to describe the case that condition
is true
. Otherwise, I write c & !condition
to describe the case that condition
is false
. In both cases the constraint c
had been given before the check. From the check of condition
you can often reason about a new constraint c'
that holds. If condition
is true
implies that c'
holds, you can rewrite c & condition
as c with c'
. Similarly, if condition
is false
implies that c'
holds, you can rewrite c & !condition
as c with c'
. If no constraint follows, you just write c
instead.
In the example, condition
is typeof x === "string"
. If condition
is fulfilled the body is executed. Thereby, c1
is cBefore & condition
. From fulfilled condition
you know typeof x === "string"
holds. From that, you can infer the constraint transition x :: string
. As a result, cBefore & condition
equals cBefore with (x :: string)
. Since cBefore
is x :: string | number | boolean
you get (x :: string | number | boolean) with (x :: string)
that is reduced to x :: string
.
After the if
statement you know that
condition
had been fulfilled and it's body has been executed that causes side effects described by t1
t1(cBefore & condition)
and can be reduced to x :: string
(like described above)condition
had been unfulfilledcBefore & !condition
!condition
is !(typeof x === "string")
which is typeof x !== "string"
(x :: string | number | boolean) with !(x :: string)
which can be reduced to x :: number | boolean
From that you can conclude that cAfter
is t1(cBefore & condition) | (cBefore & !condition)
which is equal to (x :: string) | (x :: number | boolean)
and can be reduced to x :: number | boolean
.
Now we have warmed up, we take a look at constraints that are introduced by functions.
A type guard is a function whose return type is a _type predicate_:
function isFish(pet: Fish | Bird): pet is Fish {
return (<Fish>pet).swim !== undefined;
}
Those _type predicates_ are constraints that are introduced by a function (here isFish
) and are already part of TypeScript:
let pet: Fish | Bird
if (isFish(pet)) {
pet.swim();
}
else {
pet.fly();
}
Let's add constraints as comments:
let pet: Fish | Bird
// cBefore := (pet :: Fish | Bird)
if (isFish(pet)) {
// c1 := cBefore & isFish(pet)
// = (pet :: Fish | Bird) & (pet :: Fish)
// = (pet :: Fish)
pet.swim();
}
else {
// c2 := cBefore & !isFish(pet)
// = (pet :: Fish | Bird) & !(pet :: Fish)
// = (pet :: Bird)
pet.fly();
}
Before the if
statement you know that pet
is Fish | Bird
. If isFish(pet)
is fulfilled, the type predicate pet is Fish
of isFish
holds for the value of our local variable pet
. That means that it adds a constraint pet :: Fish
to c1
. Thus, it's ok to call pet.swim();
. In the else branch you know that the type predicate pet is Fish
is unfulfilled and you can add !(pet :: Fish)
to c2
, which results in pet :: Bird
.
In a similar way to type predicates of type guards I introduce constraints that can be automatically inferred from expressions and statements. I will add constraints to the functions one after another in my modified example (see comment above) of the example _behavior on locals_ (see first post) to answer the question is this possible?
in the comment in the function doSomething
:
enum Token { Alpha, Beta, Gamma }
let token = Token.Alpha;
function nextToken() {
token = Token.Alpha;
}
function maybeNextToken() {
if (condition) {
nextToken();
}
}
function doSomething() {
if (token !== Token.Alpha) {
maybeNextToken();
// is this possible?
if (token === Token.Alpha) {
// something happens
}
}
}
nextToken
function nextToken() {
token = Token.Alpha;
}
nextToken
assigns Token.Alpha
to the shared local variable token
. Like the assignment x = 1;
in the if
statement example, token = Token.Alpha
is a side effect that can be described as a constraint transition:
function nextToken() {
// cBefore
token = Token.Alpha; // t1: c -> c with (token :: Token.Alpha)
// cAfter := t1(cBefore)
// = cBefore with (token :: Token.Alpha)
}
Here cBefore
describes the constraints that we have before calling nextToken
and cAfter
describes the constraints that we have after calling nextToken
. Like an assignment, a function can be described as a constraint transition. Since nextToken
is doing just the same as the assignment token = Token.Alpha;
it describes the same constraint transition as t1
.
maybeNextToken
Let's have a look at the caller maybeNextToken
:
function maybeNextToken() {
if (condition) {
nextToken();
}
}
First you add the constraint transitions (here t1
for nextToken
) and afterwards the constraints of the if
statement:
function maybeNextToken() {
// cBefore
if (condition) {
// cBefore & condition
nextToken(); // t1: c -> c with (token :: Token.Alpha)
// t1(cBefore & condition)
}
// cAfter := t1(cBefore & condition) | (cBefore & !condition)
}
Since condition
is not specified in the original example you don't know its result. I assume that it is an expression without any side effect. To relax this assumption even further I consider condition
is a shared variable of type boolean
. Thereby, you can derive condition :: true
if condition
is fulfilled and condition :: false
otherwise. Thereof, cAfter
is
((cBefore with (condition :: true)) with (token :: Token.Alpha)) | (cBefore with (condition :: false))
Despite that simplification maybeNextToken
is described as the constraint transition cBefore -> cAfter
.
doSomething
function doSomething() {
if (token !== Token.Alpha) {
maybeNextToken();
// is this possible?
if (token === Token.Alpha) {
// something happens
}
}
}
Like in maybeNextToken
you first add constraint transitions and afterwards the constraints of the if
statement:
function doSomething() {
// cBefore
if (token !== Token.Alpha) {
// c1
maybeNextToken(); // t1: c -> ((c with (condition :: true)) with (token :: Token.Alpha)) | (c with (condition :: false))
// c2
// is this possible?
if (token === Token.Alpha) {
// c3
// something happens
// c4
}
// c5
}
// cAfter
}
Our goal is to answer the question if it is possible that token === Token.Alpha
is fulfilled in the condition of the second if
statement. This is only possible if c2
contains a constraint token :: T
where T
is a type that contains Token.Alpha
. Hence, you only look at c1
and c2
:
c1
is cBefore & token !== Token.Alpha
. Since token
is not Token.Alpha
it only can be Token.Beta
or Token.Gamma
. This results in cBefore with (token :: Token.Beta | Token.Gamma)
.c2
is derived by replacing c
in t1
with c1
:ts
c2 := ((c with (condition :: true)) with (token :: Token.Alpha))
| (c with (condition :: false))
= (((cBefore with (token :: Token.Beta | Token.Gamma)) with (condition :: true)) with (token :: Token.Alpha))
| ((cBefore with (token :: Token.Beta | Token.Gamma)) with (condition :: false))
Since the type of the value of token
in cBefore
is overridden by cBefore with (token :: ...)
operations, you can leave cBefore
out:
ts
= (((token :: Token.Beta | Token.Gamma) with (condition :: true)) with (token :: Token.Alpha))
| ((token :: Token.Beta | Token.Gamma) with (condition :: false))
I'm only interested in token
to answer the question. Hence, you can remove all with
operations of other variables (here condition
):
ts
= ((token :: Token.Beta | Token.Gamma) with (token :: Token.Alpha))
| (token :: Token.Beta | Token.Gamma)
= (token :: Token.Alpha)
| (token :: Token.Beta | Token.Gamma)
= token :: Token.Alpha | Token.Beta | Token.Gamma
= token :: Token
From c2 = token :: Token
you conclude that the answer of the question is:
Yes, token
might be Token.Alpha
in the condition of the second/inner if
statement.
@maiermic your post was pretty complex, so I may have misunderstood it, but it looks like you are describing what control flow anaylsis already does, plus extending it to 'look into' the functions being called within guarded blocks as well. If so, how is this different to the inlining approach mentioned by @RyanCavanaugh in the OP?
@yortus That's right, I describe what control flow analysis already does to introduce the notation I use afterwards to explain my idea. However, I'd like to clarify that I don't 'look into' a function every time it is called (if that wasn't clear so far). Instead I infer the constraint transition of a function when I look at it's definition. Therefore, I have to look into the function, but I only have to do this once, since I can carry the constraint transition in the type of the function. When I look at the function call, I can use the constraint transition that I inferred before to calculate the constraint that holds after the function call from the constraint that holds before the function call.
Mitigating with (shallow) inlining / analysis
I don't know how _(shallow) inlining/analysis_ works and @RyanCavanaugh doesn't explain it. Nevertheless, he shows two examples.
1. Example
// Non-null assignment can still trigger null warnings
function fn(x: string | null) {
function check1() {
x = 'still OK';
}
if (x !== null) {
check1();
// Flow: Error, x could be null
console.log(x.substr(0));
}
}
Flow claims that x could be null
after the call of check1
. Flow doesn't look into check1
. Otherwise, it would know that x
has type string
and cann't be null
.
In my approach, the constraint transition of check1
is inferred as t: c -> (x :: string)
. Since we checked that x !== null
is true
before and the type of the variable x
is string | null
we have x :: string
as constraint that holds before the call of check1
. Hence, we pass x :: string
to t
to get the constraint that holds after the call of check1
. t(x :: string)
results in x :: string
. As a consequence, x
cann't be null
in console.log(x.substr(0));
.
Note: You can even infer x :: 'still OK'
in check1
in that case. Thereby, you could even determine the type of the return value of x.substr(0)
which is 's'
. However, this would require further knowledge about the build-in method substr
. I didn't cover such an approach in my previous post.
2. Example
// Inlining is only one level deep
function fn(x: string | null) {
function check1() {
check2();
}
function check2() {
x = null;
}
if (x !== null) {
check1();
// Flow: No error
console.log(x.substr(0)); // crashes
}
}
To come straight to the point, the constraint transition
check2
is t2: c -> (x :: null)
andcheck1
is t1 = t2
Before the call of check1
we have x :: string
. Afterwards we have t1(x :: string)
, which results in x :: null
. Thus, my approach detects the error and flow doesn't.
Quite long to fully read and understand, but this seems very interesting. It makes me think of a kind of propositional logic solver like the Prolog language. Correct me if I'm wrong.
Not sure it would be easy to implement such a thing, but, this could be an evolution target for the TypeScript's Type Inference and Control Flow analysis.
@maiermic hmm yes interesting. I suppose two issues would be:
a) performance. @RyanCavanaugh mentions a "full inlining solution [...] wouldn't be even remotely practical", but what about this deep constraint analysis?
b) what about calls to third-party functions (e.g. calls into platform, frameworks, or node dependencies) where we have no function bodies to look into (only .d.ts declarations)?
might be related https://github.com/Microsoft/TypeScript/issues/8545#issuecomment-218835533
@aleksey-bykov #8545 (comment) looks like a similar approach for manually annotating constraints, even though you only refer to a specific case (object initializers)
@yortus good questions
a) performance. @RyanCavanaugh mentions a "full inlining solution [...] wouldn't be even remotely practical", but what about this deep constraint analysis?
@RyanCavanaugh May you please explain performace issues of a full inlining solution?
Without thinking too much about it, I guess deep constraint analysis might be doable in linear or linearithmic time complexity. Gathered constraints are seralizable and can be cached so that you can speed up analysis time. Manuall constraint annotations can further improve analysis time.
b) what about calls to third-party functions (e.g. calls into platform, frameworks, or node dependencies) where we have no function bodies to look into (only .d.ts declarations)?
If you don't know the code, you cann't do flow analysis. Without .d.ts declarations you couldn't even do any type analysis. You either include constraint annotaions in .d.ts files or you have to fall back to another approach. You might choose an optimistic approach if no constraint annotaions exist, since you can add constraint annotations to .d.ts if a function has a relevant side effect.
Initially I thought that the current unsound strategy would work, though in the previous months 1) narrowing works on more locations (with CFA) 2) narrowing works on more types (enums, numbers) and 3) narrowing works with more type guards. This has demonstrated several issues with the current strategy.
A solution that has been discussed was that the compiler should track where a variable might be modified through a function call. Since JavaScript is a higher-order language (functions can be passed as arguments, stored in variables and so on), it's hard/impossible to track where a function might be called. A reassignment of a variable can also happen without a function call, for instance by a getter or setter function, or by arithmetic that calls .valueOf()
or .indexOf()
. For a language that is not higher-order, the analysis can be done easily, since a function call will directly refer to the declaration of a function. Using the same analysis in a higher-order language will still be unsound; this will probably cause issues when an assignment happens in .forEach(() => { ... })
for instance.
My suggestion is basically that a variable may only be narrowed if its assignments can be tracked completely. A variable can only be narrowed if all reassignments (that is, assignments except the initializer) are inside the control-flow graph of the current function. That means that a variable declared with const
or one that has no reassignments can be narrowed in any function. A variable with reassignments in different functions can never be narrowed. The later is restrictive; a user should copy a value of such variable into a const
variable to narrow it.
To reduce this restriction, the control-flow graph of the function could be extended to an inter-procedural control flow graph. The analysis stays sound, but becomes more accurate. The graph can relatively easily be extended at call expression that directly refer to a function declaration. Assignments in functions that are not used in higher-order constructs can now be tracked too. It would be interesting to extend this to higher-order functions too, but that would be more complex. It would be necessary to annotate function definitions with the control flow of their arguments some way.
I'm not sure whether an abstraction over the control flow graph is needed. The graph could become a lot bigger with this inter-procedural analysis. However, the compiler already uses a compressed control flow graph, which basically contains only branch nodes and assignments. It might help to track which variables are referenced by a function to skip the whole function if the analyzed variable is not referenced there.
Thoughts? Let me know if something is not clear.
How is this code possible with strictNullChecks
on? I want to use strict null checks, but this control flow checking makes it impossible to write this kind of code efficiently:
let myValue: string
[1, 2, 3].forEach(x => {
if (x == 2) {
myValue = 'found it'
}
})
console.log(myValue) // error: myValue used before assignment
Since the callback is not guaranteed to be executed TS sees that not all code paths assign the value. The same would be true when using a for of
loop, because the array is not guaranteed to be non-empty. Even though I as a developer can guarantee for sure that [1, 2, 3]
is not empty and forEach
_will_ be called, there is no way for me to tell that TS, no /* ts ignore no-use-before-assignment */
or something like that. I have to disable strict null checks completely.
@felixfbecker
Either initialize it with a sentinel:
let myValue: string = ''
// Dangerous if myValue is never actually set to a real value
Or, if you're worried about the sentinel never getting overwritten, with undefined
. This more closely mimics the underlying JS but requires !
assertion on use:
let myValue: string | undefined = undefined;
...
console.log(myValue!);
// Still dangerous if myValue is never actually set to a real value, but less so since undefined is likely to fail earlier / harder
Or, have a check to narrow the type:
let myValue: string | undefined = undefined;
...
if (myValue === undefined) {
throw new Error();
}
console.log(myValue); // Guaranteed safety
@Arnavion thanks!
@RyanCavanaugh Following up here, from #11393, with a suggestion.
I would be nice to have an indication of whether the optimistic heuristic was used or not when reporting an error. Something like Operator '===' cannot be applied to types 'false' (likely) and 'true'
.
This will allow the developer to quickly figure out if the error is certain, or if it may just be the consequence of an optimistic assumption made by the compiler. It may also reduce the flow of non-bugs reported here 馃槃.
optimistic heuristic
Just wanted to point out that TypeScript has no concept of this at the moment :rose:
@basarat s/heuristic/assumption/
What I mean is the following:
function foo(arg: any) { arg.val = true; }
var arg = { val: true };
arg.val = false;
foo(arg);
if (arg.val === true) console.log('test is true!');
Error: Operator '===' cannot be applied to types 'false' (likely) and 'true'
Without foo(arg)
call:
Error: Operator '===' cannot be applied to types 'false' and 'true'
Would it be possible to have an affects ...
return type, in the same way as we have ... is ...
? Any functions containing assignments or invocations of functions returning affects ...
would implicitly have affects ...
intersected with the annotated return type. This could be used to do transitive flow analysis as we currently have for never
, and avoid some problems with excessively aggressive or permissive narrowing due to lack of information at function call boundaries.
Went through the thread and realized what I was proposing is a naive subset of what has already been fleshed out in @maiermic's amazing treatment.
I think it is worth opening a separate issue to discuss the merits of that approach and possibly get a feel for it by reimplementing the existing type guards feature in this fashion. A systematic approach to flow analysis would make reasoning about more advanced features much easier. E.g. if this was implemented, pure
could be implemented internally as cAfter := cBefore
. Even if this doesn't end up affecting the implementation in any way, it seems like a useful convention for discussing flow analysis in the compiler.
Didn't find similar case, could you check this out?
function render(): string | null {
const state: "a" | "b" | "c" = "a"
if (state === "a") {
return "a"
} else if (state === "b") {
return "b"
} else if (state === "c") {
return "c"
}
}
This code fails to compile on typescript 2.1 with strictNullChecks
option enabled. Is it covered by somw tradeof with control flow analysis or could be considered as a bug?
Here's a case that should be possible to figure out:
function f(a: number[] | undefined) {
if (a === undefined) {
return [];
}
// Works
a = a.map(x => x + 1);
return a.map(x =>
// Error: Object is possibly 'undefined'.
x + a.length);
}
The closure only exists in a scope where a
exists. There's no way to access it in the if
block.
@Strate That function will return undefined
, not null
.
@andy-ms Your example above works as long as there are no assignments to a
anywhere in the function. See #10357.
This below simple snippet demonstrates the problem again. (It fails in the if
statements with "Operator '==' cannot be applied to types 'Y.A' and 'Y.B'.")
Without proposing any changes to the language or heuristics here, what's an idiomatic way to make this code compile? Right now it's rejected by the compiler. Should we be inserting type coercions on each use of x
to force it back to type Y
?
enum Y {
A,
B
}
let x: Y = Y.A;
x = Y.A;
changeX();
if (x == Y.B) { ... }
if (x == Y.B) { ... }
function changeX() {
x = Y.B;
}
You could write const getX = () => x;
and use getX()
in place of x
.
(You could also change the design to something like x = changeX()
.)
Is there any options to disable this current optimistic behavior or disable cfa completely???
FWIW, I just stumbled over exactly the same problem, also in a tokenizer/parser, where a next_token() method is called which sets a class member to another token, and the calling code doesn't detect that there's a new token and tells me my comparison is impossible (because it thinks the token type must be LeftBracket):
Look, I get that there is a debate on how much CFA typescript should do. But the CFA errors are cryptic.
I just spent 20 minutes debugging this:
run() {
this.state = State.RUNNING;
while (this.state === State.RUNNING) {
const nextOp = this.code[this.pc];
if (nextOp === undefined) {
break;
}
nextOp.operation(this);
}
if (this.state === State.EXCEPTION) {
throw new Error(this.exceptionReason);
}
}
Which results in
Operator '===' cannot be applied to types 'State.RUNNING' and 'State.EXCEPTION'.
Presumably because CFA thinks that this.state
must always === state.RUNNING
. But nextOp.operation(this)
can change this.state
which CFA cannot deduce at compile time.
Of course, if you remove the break, all of a sudden the code works! :confounded:, because the CFA is no longer over-eager about determining the value of this.state
. (IMO this is a bug, but it seems that there is a discussion about this instead)...
Either way, I believe most programmers would be scratching their heads wondering what is going on. It would be helpful if the error message said something to the effect of: Error: CFA predicts that this path is not possible, try refactoring your code.
I think it might be a little bit easier to reason about this sort of thing if the treatment of effects (side effects like local mutation) and "coeffects" (requirements that certain effects have occurred) was formally specified. Then it would be clearer which weird behaviors are down to PEBKAC, which ones are due to bugs in the type checker implementation, and which ones are bugs in the specification.
Hi there! Sorry, but I lost the overview what this issue currently tracks and if there is a discussion to fix some flow related issues?
I came here to ask this question, because I saw two unrelated people stumbled over a destructing issue related to union types and they couldn't figure out the problem. I know _why_ this happens, but is there some discussion to maybe fix this behaviour to make some code more ergonomic?
type AorB = { type: 'a'; value: string } | { type: 'b'; value: number };
function fun({ type, value }: AorB) {
switch (type) {
case 'a':
return value.toLowerCase(); // throws: TS thinks it could be `number`
case 'b':
return value;
}
}
@donaldpipowitch to solve this kind of issue, there's a need in somekind of depended type expression, that the control flow analysis & destructuring will create on the fly
I've typed to make one explicitly with conditional mapped types, but its not working:
function fun<T extends 'a' | 'b'>(type: T, value: T extends 'a' ? string : number) {
switch (type) {
case 'a':
return value.toLowerCase(); // throws: TS thinks it could be `number`
case 'b':
return value.toExponential();
}
}
function alsoNotLikeThat<T extends 'a' | 'b'>(type: T, value: typeof type extends 'a' ? string : number) {
switch (type) {
case 'a':
return value.toLowerCase(); // throws: TS thinks it could be `number`
case 'b':
return value.toExponential();
}
}
If would be nice if my code above would have worked
I see there's few examples with synchronous side effects in functions here, but lots of linked issues around how this causes big problems with async/await, and no direct discussion of that that I can see, so it's worth highlighting. Here's an example of some async code where this issue causes problems:
async function test(p: Promise<any>) {
let x: 'a' | 'b' = 'a';
setTimeout(() => {
x = 'b';
}, 500);
await p;
// x is inferred as 'a' here, so this isn't allowed, but it could be 'b'
if (x === 'b') {
}
}
Note that the inference is wrong here despite explicit types of x
. This issue makes it very awkward to write a function using await
to wait for a side effect, which I think is a fairly common case.
Meanwhile in the equivalent promise-based code the types work totally fine:
function test(p: Promise<any>) {
let x: 'a' | 'b' = 'a';
setTimeout(() => {
x = 'b';
}, 500);
p.then(() => {
// x is 'a' | 'b' here, correctly
if (x === 'b') {
}
});
}
Had a similar problem strangely similar to @RyanCavanaugh's original post:
I just just experienced the exact problem that @pimterry explained in (https://github.com/Microsoft/TypeScript/issues/9998#issuecomment-439610488) and was a bit confused as why it complained to me that this wasn't allowed
+1 for https://github.com/Microsoft/TypeScript/issues/9998#issuecomment-439610488, this is really stupid, await is definitely not a pure statement.
I just wanted to document another workaround for this problem. Instead of using a getter function as described by @ahejlsberg, it's also possible to use a type assertion:
// instead of
// let value: 1 | 2 = 1;
let value = 1 as 1 | 2;
function changeValue() {
value = 2;
}
changeValue();
value; // `value` is correctly typed as 1 | 2
The reason I don't use TypeScript in certain projects where we are not forced to use TypeScript is due to this issue.
interface Something {
maybe?: () => void;
}
function run(isTrue: boolean): Something {
const object: Something = {};
if (isTrue) {
object.maybe = (): void => {
console.log('maybe');
};
}
return object;
}
run(true).maybe();
The code above is deterministic. There will always be one outcome. TypeScript should be able to follow this. I understand it will however need to follow this object specifically and every other object for that matter through their entire life in the code. Which would put a lot of stress on the compile time checking.
I don't understand why I am allowed to set a method as optional and use it in this way.
Personally I don't see the advantage of using TypeScript when this is the type of code I wish to write, hence why I use plain JavaScript.
Regarding how to implement this, a simple flag in the tsconfig could allow control flow analysis.
Sorry for bumping this, but this issue becomes quite old while there seem to be no plans in TS roadmap to do something with it.
The current situation with await
is not invalidating type narrowing is quite severe, as it allows to easily trigger runtime errors when there are many async functions dealing with the same object, as it's described in https://github.com/microsoft/TypeScript/issues/9998#issuecomment-439610488
Note that while invalidating each type after await
feels like "hard to use` for the end-users, the issue is easily solved by what Flow already proposes: if you need to work with some value of the object which might be invalidated, simply save the reference and be with it:
declare let test:
| { type: 'ONE'; value?: number }
| { type: 'TWO'; value: number }
const run = async (): Promise<void> => {
if (test.type !== 'TWO') {
throw null
}
const savedValue = test.value
await delay(1000)
test.value // should invalidate to be `number | undefined`
savedValue // should be type narrowed `number`
}
@basickarl You聽need to聽use聽generics for聽your聽case:
interface Something {
maybe?: () => void;
}
function run<T extends boolean>(isTrue: T):
T extends true
? Something & Required<Pick<Something, "maybe">>
: Something {
const object: Something = {};
if (isTrue) {
object.maybe = () => {
console.log('maybe');
};
}
// @ts-ignore
return object;
}
run(true).maybe();
// @ts-expect-error
run(false).maybe();
@ExE-Boss Thank you SO much! I posted a question on Stack Overflow but it was slaughtered, I will update with this code of yous, hopefully it will help future people!
@basickarl Would you mind posting a link to your Stack Overflow question here too?
not sure if it's related to this issue, anyway this is what happens with my code:
let resolver;
new Promise(resolve => (resolver = resolve));
console.log(resolver); // error: Variable 'resolver' is used before being assigned.ts(2454)
I post here an issue, which i think related to this topic. I Don't really understand why the compiler is not able to infer arg: c
in my last function :
(tell me if i should open a new issue instead)
type t = number[] | c;
class c {
constructor(public data: number[]) {}
m(n: number): void { console.log(n); }
}
function f(arg: t): void {
if (!(arg instanceof c)) {
arg = new c(arg);
}
arg.m(0); // expected behavior: arg is inferred as c
}
function g(arg: t): void {
if (!(arg instanceof c)) {
arg = new c(arg);
}
(() => {
arg.m(0); // expected behavior: arg is inferred as c
})();
}
function h(arg: t): void {
if (!(arg instanceof c)) {
arg = new c(arg);
}
for (let n = 0; n < 3; n++) {
arg.m(n); // expected behavior: arg is inferred as c
}
}
function i(arg: t): void {
if (!(arg instanceof c)) {
arg = new c(arg);
}
[0, 1, 2].forEach(n => {
arg.m(n); // actual behavior: arg is not inferred as c !
});
}
@basickarl hey, You can use overloading:
interface Something {
maybe?: () => void;
}
function run(isTrue: true): { maybe: () => void };
function run(isTrue: boolean): Something {
const object: Something = {};
if (isTrue) {
object.maybe = (): void => {
console.log('maybe');
};
}
return object;
}
run(true).maybe(); // no error here anymore
Since #40860 was closed automatically...
interface ValidationErrorMap {
readonly [errorCode: string]: string | ValidationErrorMap;
}
type ValidationResult = ValidationErrorMap | "success";
function combine(results: ReadonlyArray<ValidationResult>): ValidationResult {
let finalResult: ValidationResult = "success";
for(const result of results) {
finalResult = (result === "success") ? finalResult
: (finalResult === "success") ? result
: { ...finalResult, ...result }; // Spread types may only be created from object types.
}
return finalResult;
}
Intuitively, if the compiler is smart enough to recognize that finalResult
is initialized with a specific value of "success"
it should be smart enough to see that the value changes inside a loop scope, and can't be assumed.
Since #41045 was closed automatically...
type Thing = { data: any };
const things: Map<string, Thing> = new Map();
function add_thing (id: string, data: any) {
let thing = things.get(id);
if (typeof thing === 'undefined') {
thing = { data };
things.set(id, thing);
}
// Object is possibly 'undefined'
return () => thing.data;
}
It is not possible for "thing" to be undefined based on the above code.
From #41113, I'll add another duplicate to the pile:
function doSomething(callback: () => void) {
callback();
}
let result: 'foo' | 'bar' = 'bar';
doSomething(() => {
result = 'foo';
})
// This should work, but fails because type is always "bar"
if (result === 'foo') {
}
You can fix it by casting to the appropriate type:
let result: 'foo' | 'bar' = 'bar' as 'foo' | 'bar';
This can also happen with promises - although it may be possible to detect the case where the type needs to be expanded
Consider the following:
enum State {
Start = 0,
Foo = 1,
Middle = 2,
End = 3
}
class TestClass {
public state: State = State.Start;
async go(): Promise<void> {
this.state = State.End
const promise = new Promise<void>(resolve => {
// doSomething();
this.state = State.Middle;
resolve();
});
await promise;
if (this.state === State.Middle) { // Compiler Error
}
}
}
const testClass = new TestClass()
console.log(testClass.state);
testClass.go();
console.log(testClass.state);
Most helpful comment
I see there's few examples with synchronous side effects in functions here, but lots of linked issues around how this causes big problems with async/await, and no direct discussion of that that I can see, so it's worth highlighting. Here's an example of some async code where this issue causes problems:
(Playground link)
Note that the inference is wrong here despite explicit types of
x
. This issue makes it very awkward to write a function usingawait
to wait for a side effect, which I think is a fairly common case.Meanwhile in the equivalent promise-based code the types work totally fine: