Typescript: Thesis on improved type system

Created on 8 Oct 2015 · 19Comments · Source: microsoft/TypeScript

Hi everyone,

I'm currently starting my bachelor thesis, as part of the Honours Program at the University of Utrecht. The subject will be something with type inference in TypeScript. The final product - besides the written thesis - will be an implementation of the algorithm in TypeScript.

My plan is to fork the TypeScript compiler, and it'd be great if my changes would be merged into TypeScript. My subject is not yet fixed, so I'd like to discuss here which subject would be most suitable.

Here are the two ideas I've got:

1 - Control flow based typeguards.

Example:

let x: string | number = ...;
if (typeof x === number) {
    // x: number
    x = x.toExponentional();
    // x: string
} else {
    // x: string
    x = x.substring(1);
    // x: string
}
// x: string
console.log(x.substring(1));

A different kind of type guards could be implemented like this:

function expectString(x: any) x as string {
    if (typeof x !== 'string') {
        throw new Error('Expected string, got ' + typeof x);
    }
}
let x: string | number = ...;
// x: string | number
expectString(x);
// x: string

This typeguard can be useful when checking the provided arguments in for instance a nodejs server or a public api.

2 - Improved inference using function bodies

Example:

function square(x) {
    return x * x;
}
// Current: square(x: any): number
// Expected: square(x: number): number

function squareLength(x, y) {
    return square(x) + square(y);
}
// Current: squareLength(x: any, y: any): number
// Expected: squareLength(x: number, y: number): number

Dificulties are polymorphic functions (like the identity function (x) => x) and recursive functions. Algorithms that can handle these functions exist (Cartesian Product algorithm or (Extended) Monotone Framework). Such algorithm has been implemented for PHP (without type annotations, only the standard library had hard coded types). This would also be useful for editors which use the language services for JavaScript code, as a lot more types can be inferred. Support for objects will be limited (see below), given the time for my thesis. Limited support for objects is probably good for performance, too. Overall performance will depend on the precision, in some cases we can for example fallback to any to get reasonable performance.

Limited support for objects means that methods, that could be part of an object on the heap, won't be tracked, like the following example:

declare class A { foo(): string; }
declare class B { foo(): number; }
let x;
let y = x.foo();

The algorithm in the PHP implementation would infer y to string | number. Not supporting this means y is typed as any, which would have my preference, as it would fit better in TypeScript. I haven't chosen a specific algorithm yet, I'd like to get some 'requirements' to choose one, for instance, what should happen with recursive functions (a function might call itself, or a function f might call g, that might call f).

I'd like to get some feedback on these ideas. Would you be interested in merging these two, or one of these, into the compiler? Or do you have some related suggesions that I could implement?

Discussion

Source

ivogabe

👍3

Most helpful comment

Long time without an update, though I have some good news now. It took a bit longer than expected, but it is almost finished now. My supervisor and I have submitted a paper based on the thesis to a conference and they accepted it. We're now working to finish the paper, and we would like some feedback of yours. So is there someone of the TypeScript team that wants to read it? I can send the paper by email. We have to finish the paper on Monday, so it would be great if you could give some feedback this weekend.

Anyone interested?

ivogabe on 24 Nov 2016

🎉3

All 19 comments

The first idea is similar to something we've been thinking about for a while; see #2388. I think if we had a good implementation of type guards based on control flow, we'd be likely to merge it in. The x as T throwing pattern doesn't seem like something we would support, though perhaps sufficient data (e.g. this pattern is very common) could be presented here.

The second one, probably not. We definitely could have implemented a more sophisticated inference algorithm from the beginning (a la flowtype), but our inference algorithms are deliberately more straightforward. The reasons for this are performance and predictability -- inference from function implementations generally requires whole-program analysis (not cheap), and it's desirable to be able to follow along with the type inference without needing the entire program text to be able to make definitive statements.

RyanCavanaugh on 8 Oct 2015

I'd seen that issue too. About the x as T guards, personally I expect that I would be using that pattern a lot. When I write a nodejs server I use a lot of function like this:

function expectString(x: any): string {
    if (typeof x === 'string') return x;
    throw new Error('Invalid type, expected string');
}
// ...
const username = expectString(args.username);
const password = expectString(args.password);

I would prefer writing this using the as guards. I might be the only one who uses this, I will investigate what users are using most and I'll come with a more detailed proposal later.

I understand the concerns with the second idea. It might be possible to do some analysis locally in a partially typed program, but that would still be bad for the predictability. The first suggestion would probably be more suitable indeed.

ivogabe on 8 Oct 2015

One thing to consider is the tooling scenario. for instance JS tooling for instance. you could use some of the ideas in 2 to provide better suggestions in IDE's. this gives you a lower bar for correctness compared to issuing errors in user code, and a lower perf bar compared to a whole program typechecking as well.

I would be interested in seeing what possible function body inference techniques can be used and what is the value on the end user experience.

mhegazy on 8 Oct 2015

@mhegazy That sounds like an interesting suggestion. Do you see that as an addition to TypeScript / the JS language services, or more general as a separate project? For completions in the function body the call sites can be used to give completions.

An additional idea, also related to tooling and similar to the second idea, could be to suggest types. A new context menu option could be added that suggests a type for an untyped parameter. This could even be introduced as a 'quick fix' option for noImplicitAny errors in Visual Studio. Small example:

function square(x) { // Error: x has type 'any'
  return x * x;
}

A quick fix would be shown on the 'x' in the function header, saying 'Fix: add type annotation (x: number)'. This type can be determined by using all call sites or the function body. This will probably have no performance issues as only one variable, is inferred, not all (though the algorithm might infer other variables too, to get a good result). What do you think of such feature?

ivogabe on 9 Oct 2015

Sorry for the delay.

Do you see that as an addition to TypeScript / the JS language services, or more general as a separate project? For completions in the function body the call sites can be used to give completions.

I think there is room in the Language Service side to do more of this, either as you mention as quick fixes, or even as a way to provide better JS editing experience.

Here are issues we are actively pursuing to enable better JS tooling using the TS core analysis engine (type system + language service): https://github.com/Microsoft/TypeScript/issues/4793 and https://github.com/Microsoft/TypeScript/issues/4790; and here is a branch where this work is going on: https://github.com/Microsoft/TypeScript/pull/5044.

An additional idea, also related to tooling and similar to the second idea, could be to suggest types. A new context menu option could be added that suggests a type for an untyped parameter. This could even be introduced as a 'quick fix' option for noImplicitAny errors in Visual Studio. Small example:

:+1: :tada:

We have experimented with the quick fix hookup, but have not had the time to fully implement it. I think this is going to be done in the near to medium term.

mhegazy on 15 Oct 2015

I've discussed the ideas with my supervisor. I'll first take a look at the extended type guards and make two implementations: one "quick and dirty" implementation that fits good in the current compiler and one based on an existing theorem ("Monotone Framework"), and compare their performance and accuracy. The latter implementation can optionally be used to implement the quick fix idea or the tooling ideas, I need to decide whether I do that or just focus on the type guards later on.

ivogabe on 17 Oct 2015

@ivogabe You may want to look at:
http://pag-www.gtisc.gatech.edu/courses/cs6340/dataflow_analysis_part2.pdf

The Bit Vector framework is useful and performs well.

jbondc on 18 Oct 2015

+1 for the Control flow based typeguards.

falsandtru on 4 Dec 2015

I have experimented with flow based type guards. Basic functionality is working, but the compiler cannot compile itself yet. You can find my code here

I've extended the reachability checks, instead of passing through only the reachability state, I also keep track of a list of previous occurrences of 'flow markers' and type guards. Every flow marker gets a property with the previous flow markers and guards. With that information, the checker can find previous occurrences of an identifier. A flow marker is an identifier (used as an expression) or an iteration statement (to simplify binding of iterations). Narrowing with iteration statements is not fully working yet, as the checks that prevent infinite recursion (x has the type of x, which has the type of x...) are too strong for that. The checker takes the union of the types of the previous occurrences. Types are narrowed using the type guards, that already existed, and assignment.

I'd like to get some feedback on the following issues I had:

1 - User defined type guards

From checker.ts:

function getSymbolOfEntityNameOrPropertyAccessExpression(entityName: EntityName | PropertyAccessExpression): Symbol {
        if (isDeclarationName(entityName)) {
            return getSymbolOfNode(entityName.parent);
        }

        if (entityName.parent.kind === SyntaxKind.ExportAssignment) {
            return resolveEntityName(<Identifier>entityName,
                /*all meanings*/ SymbolFlags.Value | SymbolFlags.Type | SymbolFlags.Namespace | SymbolFlags.Alias);
        }

isDeclarationName is a type guard with return type name is Identifier | StringLiteral | LiteralExpression. Because of that, the cast to Identifier in the next if statement is disallowed since this entityName is narrowed as if it was in the else block of the first if. I get this error a lot in checker.ts. What would you think of 'weak' user defined type guards?

2 - Assignments

This code doesn't compile:

let x: { prop?: number; };
// x: { prop?: number; }
x = {};
// x: {}
x.prop = 42; // Error

I think that the best solution would be to prevent such narrowing. Only narrowing of union types (remove all parts to which the expression is not assignable to). I think subclasses can also be narrowed. Generics cannot be narrowed:

let x: {}[];
x = ['lorem']; // Don't narrow to x: string[]
x.push(42); // Should work

ivogabe on 6 Dec 2015

@jbondc The Bit Framework requires that L, the set of all type, is the power set of a finite set D. The power set of a set D is the set with all subsets. That's how I'll represent types too, as the union of all types in such subset. The set of types is not finite in TypeScript, you can for example create these types:
string[], string[][], string[][][], string[][][][], ...

ivogabe on 6 Dec 2015

What would you think of 'weak' user defined type guards?

I think that had user-defined type guards been flow-based from the beginning, we wouldn't have coded that that way. I'd favor that sort of breaking change, but it would definitely be frustrating.

If you're running into a lot of issues like that, and it's more trouble than it's worth, you can treat user-defined type guards as "weak" as a work-around.

Perhaps you should check for strict subtypes before narrowing in your cases. To quote the spec:

The assignment compatibility and subtyping rules differ only in that

the Any type is assignable to, but not a subtype of, all types,

the primitive type Number is assignable to, but not a subtype of, all enum types, and

an object type without a particular property is assignable to an object type in which that property is optional.

Which works out because

You'd never want to "narrow" to any.
You don't want to lose out on optional properties.
It's questionable whether you want to narrow from an enum to number, but for practical situations like var x = Flags.None; x = Flags.A | Flags.B;, it probably is more desirable.

DanielRosenwasser on 6 Dec 2015

If you're running into a lot of issues like that, and it's more trouble than it's worth, you can treat user-defined type guards as "weak" as a work-around.

It happens 9 times in checker.ts, other files don't seem to have such issues. I could also add a new type guard syntax, like x is? string that would keep the existing behavior of the normal type guard syntax. (Though flow based type guards would be breaking any way, I'm afraid)

Perhaps you should check for strict subtypes before narrowing in your cases. To quote the spec:

As far as I can see that wouldn't solve the array and generics issue. I could solve that by not narrowing generics.

ivogabe on 6 Dec 2015

Traditionally, we've considered assignments exempt to type guard narrowing (or rather that they disable narrowing within that branch), I believe in order to preserve the assignment restrictions of the original declaration. If we retain this semantic, your second problem doesn't occur, right?

weswigham on 7 Dec 2015

@weswigham Yes, that's right. Though it's relatively easy to add this with flow based type guards, and it will allow code like this:

function foo(x: string | string[]) {
  if (typeof x === 'string') {
    x = [x];
  }
  // x: string[]
}

ivogabe on 7 Dec 2015

I've opened a PR #6959. The compiler can compile itself, though one test hangs and the code needs to be cleaned up. Can someone take at look at it? I have two questions in the PR.

ivogabe on 7 Feb 2016

Anyone interested?

ivogabe on 24 Nov 2016

🎉3

@ivogabe I just emailed you so you can send me a copy.

DanielRosenwasser on 25 Nov 2016

November last year I finished the work on my thesis, and it was published in January. I got a 9.5 (out of 10) for it, and I'm really happy with that 😃. Thanks everyone here for the suggestions, help and feedback!

ivogabe on 4 Apr 2017

🎉2

@ivogabe Congratulations!

RyanCavanaugh on 4 Apr 2017

Was this page helpful?

0 / 5 - 0 ratings