Typescript: Type Inference on function arguments

Created on 11 Apr 2017  路  26Comments  路  Source: microsoft/TypeScript

TypeScript Version:
2.3.0

Code

function f(x: number){
   return x;
}

function g(x){ return x};

function h(x){ return f(x); };

Expected behavior:
I would expect the inferred type of g to be (x:any)=>any, and the infered type of h to be (x:number) => number (due to the restrictions placed by the call of f)

When compiling with noImplicitAny, I would only expect an error on g, not on h

Actual behavior:
Running tsc --declaration on this snippet gives me:

declare function f(x: number): number;
declare function g(x: any): any;
declare function h(x: any): number;

the x argument on h does not seem to use the f(x) call to tighten the definition.

Considering the simplicity of the example, I imagine I might be missing an important detail in the type system that makes this the proper inference result, but I haven't figured it out yet.

Duplicate

Most helpful comment

Soundness is mostly orthogonal.

I think the general problem is that global type inference in an imperative language is much more expensive than in a functional language. The difference between OCaml and Scala is instructive here. There's a reason Flow limits its "global" inference to file boundaries - it's not practical to scale up global inference to an entire large JS program.

Global inference also doesn't detect a lot of errors in imperative programs. Consider this code, which is a totally legal JavaScript program:

var x = { kind: 'a' };
if (Math.random() > 0.5) {
  x = { knd: 'b' };
}
x.type = "foo";

Flow says this program has one error, TypeScript says this program has two errors, H-M I believe says this program has zero errors (x ends up with as { kind: string, type: string } | { knd: string, type: string }). Who's right? I think one vs two is arguable, but it's almost certainly not zero (despite being a program that does not throw exceptions or observably mistype a value).

Global inference is also bad at producing sane errors

function doSomething(x) {
  console.log(x.y);
}

var arr = [{x: 100, y: 100}, {x: 200, z: 200}];
doSomething(arr.pop());

Flow helpfully says

2:   x.y;
       ^ property `y`. Property not found in
2:   x.y;
     ^ object literal

which does not help you at all at figuring out the typo elsewhere in the program.

So basically any sane programmer is going to put type annotations on parameters anyway (since you must at module boundaries anyway, and you must if you want sane errors). Once you have type annotations on type parameters, local inference is sufficient for the vast majority of cases.

All 26 comments

Typescript never infers function arguments. So, unless you specify them, they are always going to be any

Right, but is there a reason for them not to? It feels within reach, and not much different than other kinds of inference

It's very different and very complex. When you consider that any of those calls might have union types or overloaded signatures in their parameter types, it's not even clear what should happen for a single call (let alone multiple calls):

declare function f(n: number, s: string): void;
declare function f(s: string, n: number): void;
declare function f(x: boolean, y: boolean): void;

// g(a: ??, b: ??)
function g(a, b) {
    f(a, b);
}

Current inference is very straightforward: types almost always come from initializers or contextual types. Both of those are static one-pass things that you can follow back as a human. Trying to do inference from call sites looks neat in simple examples but can make it very hard to trace errors to their actual sources, and is unscalable to large programs.

@RyanCavanaugh Is there some sort of design doc for how TypeScript's type inferer works, and what kind of system it's based on?

Eg. I believe a Hindley-Milner inferer would give a: boolean | number | string, b: boolean | number | string for your example. This seems to be @rtpg's expected behavior.

That type would be incorrect because it allows string, string as an argument list.

The TypeScript spec https://github.com/Microsoft/TypeScript/blob/master/doc/spec.md defines how inference works at each declaration site. There is no global inference algorithm to describe because TS doesn't do global inference ala H-M.

I see. So we would have to infer the same signature as f() itself.

What was the rationale for not using global inference? Is it that TS tries to do the best it can in the absence of a sound type system?

Soundness is mostly orthogonal.

I think the general problem is that global type inference in an imperative language is much more expensive than in a functional language. The difference between OCaml and Scala is instructive here. There's a reason Flow limits its "global" inference to file boundaries - it's not practical to scale up global inference to an entire large JS program.

Global inference also doesn't detect a lot of errors in imperative programs. Consider this code, which is a totally legal JavaScript program:

var x = { kind: 'a' };
if (Math.random() > 0.5) {
  x = { knd: 'b' };
}
x.type = "foo";

Flow says this program has one error, TypeScript says this program has two errors, H-M I believe says this program has zero errors (x ends up with as { kind: string, type: string } | { knd: string, type: string }). Who's right? I think one vs two is arguable, but it's almost certainly not zero (despite being a program that does not throw exceptions or observably mistype a value).

Global inference is also bad at producing sane errors

function doSomething(x) {
  console.log(x.y);
}

var arr = [{x: 100, y: 100}, {x: 200, z: 200}];
doSomething(arr.pop());

Flow helpfully says

2:   x.y;
       ^ property `y`. Property not found in
2:   x.y;
     ^ object literal

which does not help you at all at figuring out the typo elsewhere in the program.

So basically any sane programmer is going to put type annotations on parameters anyway (since you must at module boundaries anyway, and you must if you want sane errors). Once you have type annotations on type parameters, local inference is sufficient for the vast majority of cases.

@RyanCavanaugh As always, thank you for the detailed and well thought out responses.

I think one vs two is arguable, but it's almost certainly not zero (despite being a program that does not throw exceptions or observably mistype a value).

This seems to be a philosophical question with tradeoffs.

1-2 errors is restrictive because it requires me to explicitly declare a type with 2 optional members before my program will compile, but as you said, the benefit is errors are local, which makes debugging fast and pleasant.

0 errors is more correct (the union type should be handled downstream, and if it's not, the compiler can warn about an unexhaustive match or possibly missing member), but the drawback is nonlocal errors when you're wondering "why is the compiler saying this a union type?".

[From an earlier comment] Trying to do inference from call sites looks neat in simple examples but can make it very hard to trace errors to their actual sources, and is unscalable to large programs.

How do Haskell, OCaml, and Elm engineers manage to use their languages at scale? From this SO post, it sounds like those languages are more restrictive than JS in what they can express (eg. no class extensions), which makes H-M usable.

@RyanCavanaugh relative to your initial example, where you call f (with 3 different type definitions):

I feel like in that case it would be reasonable to have this generate 3 different types for g. Maybe this leads to combinatorial explosion of types in some cases, but I imagine in most cases it won't because in real world examples function wrappers exist to make more specific calls to the inner functions.

TypeScript already has the control flow analysis in place to keep track of when x:string|number ends up actually being string or number, so it feels like you could do some of this work in reverse? Obviously easier said than done.

For full context: I _really_ want to use noImplicitAny, but our codebase includes a lot of wrapper functions of the form function(a){ return g(a,'some_literal')}, and these end up being any. If we have to explicitly declare the type signatures on all of these it adds a lot of busy work and makes the "strict type" pitch that much harder.

Is it not possible to do local inference based on the function body (that is to say: ignoring the function's calling context)?

Bringing back the flow example (which I agree is weird):

function doSomething(x) {
  console.log(x.y);
}

var arr = [{x: 100, y: 100}, {x: 200, z: 200}];
doSomething(arr.pop());

I feel like the "proper" set of errors that should happen would be:

  • doSomething gets an inferred type of (x:{y: any}):void
  • arr is a little weird, so two possibilities:

    • arr requires an explicit type, given that the dictionaries are very different

    • arr gets a type like ({x:number, y:number} | {x:number, z:number})[]

  • In the case of the union type, doSomething(arr.pop()) fails because the union type doesn't align with the type of {x: any}

I realize "well let's just have giant types" feels very hand-wavy...

I think a reasonable alternative would be "try to deduce argument types from usage within the function, and opt for any if all collected possibilities in the function body have any inconsistencies (for example number and string, but not {x:string, y:string} and {x:string}, which ends up with {x:string, y:string}). This would allow for small anonymous functions to get the "obvious" typing while still requiring to be explicit in cases where there's no good answer.

Purescript (and its record types) has some functionality close to this, so inference works pretty well (though you tend to lose type aliases. This is already the case in TypeScript though)

So basically any sane programmer is going to put type annotations on parameters anyway (since you must at module boundaries anyway, and you must if you want sane errors). Once you have type annotations on type parameters, local inference is sufficient for the vast majority of cases.

I fully agree with this. The reasoning is very straightforward. It is also worth noting that no type annotations need be placed on the parameters declared by function literals passed as callback or assigned to an already typed local.

I am glad that you referenced Scala because it also has essentially the same requirements for where type annotations must be placed as TypeScript does under --noImplicitAny, they are only required on the parameters of functions that are not callbacks and not a value assigned to a typed local.

You also mention OCaml which while it has much stronger type inference than TypeScript or Scala, exhibits some very counterintuitive behavior with respect to inference of function parameter types
Consider this OCaml:

let square x = x * x 
let i = square 4
(* 16 *)
let f = square 4.0
(* Error: This expression has type float but an expression was expected of type int *)

@aluanhaddad Nit: your example is an Ocamlism, and is not a statement about H-M in general. Eg. Haskell infers x as Num, which compiles:

let square x = x * x
let i = square 4 -- 16
let f = square 4.0 -- 16.0

@bcherny well, this works in Haskell because of type classes, but they are not a part of H-M

This is a duplicate of https://github.com/Microsoft/TypeScript/issues/15196, related discussion can be found in #1265, #15114 and #11440.

Please see my response in https://github.com/Microsoft/TypeScript/issues/15196#issuecomment-294618460

@mhegazy I don't want to litigate this too much, but what I'm asking for (using the function definition itself to infer parameter types) doesn't seem like non-local type inference? Since the function body is well scoped and doesn't belong to multiple files.

I see the argument for not having Haskell-style square inference, but what I'm talking about feels like it's much smaller and more well-defined in scope (see usages of parameters in the function body only, infer from that).

Maybe I'm missing a detail here or misunderstanding the meaning of "non-local" here.

@rtpg there's nothing in TypeScript today that infers information about a variable from its usage. I agree it's smaller than Haskell-style inference... but it would be a very large architectural change. And it's still nonlocal since presumably code like this would be expected to produce inferences.

function fn(x) {
  bar(x);
}
function bar(y) {
  baz(y);
}
function baz(z) {
  return Math.sqrt(z);
}

@RyanCavanaugh I thought type guards and flow analysis is doing exactly that - narrowing a type based on usage? Isn't the example above just an extension of that? At least for the simplest cases, it would be nice for TypeScript to achieve parity with Flow.

@joewood type guards / flow analysis are straightforward (so to speak...) because they're "top-down" - given a statement in a function, it's relatively easy to determine which control paths it's reachable from, because JavaScript doesn't have control flow structures that can go "up" (e.g. COMEFROM). This is very different from a function call - you can be in a function on line 1500 and the first call might be on line 4000. Or maybe you're indirectly called via some callback, etc..

The dream of inference from function calls is really not clear as some people would imply. Here's an example in Flow:

declare function swapNumberString(n: string): number;
declare function swapNumberString(n: number): string;

// Pass-through function
function subs(s) {
  return swapNumberString(s);
}

// This stops being an error if you comment
// out the call in g() below. ??
const s: string = subs(12);

function g() {
  subs("");
}

This is "spooky action at a distance" at its most maximal. You can have the call to subs in g, or the call in the s initializer, but not both. Huh? And why does the wrapper function change the behavior of this at all (replacing subs with swapNumberString makes the errors go away), if it's all inferential magic?

Realistically there are two cases that usually happen if you use inference from call sites / data flow analysis:

  • Your file typechecks
  • You get an error in a correctly-implemented function body due to a bad call

If your file typechecks, cool, no work required.

In the other case, the first thing you do to diagnose the problem is... drumroll... add a parameter type annotation so you can figure out where the bad call is coming from. If there's indirection, you'll probably have to do this in layers. You'll end up with a file full of parameter type annotations, which is good since you'll need them anyway for cross-file typechecks. So it's great for a single-file playground demo but for "real" software development, you'll end up with approximately the same number of parameter type annotations as if you used a more local inference algorithm.

Thanks @RyanCavanaugh for the extensive explanation. I agree, this could become a nightmare of different permutations with contradicting definitions. But, I'm wondering if this is all or nothing? Where a parameter is used definitively inside a function (e.g. against an explicit declared type with no overloads), can a deduction be made about its type? Then, if there's any whiff of ambiguity, including Unions and Overloads, then it backs off to the default any. I agree, the bottom approach is different to the type guard processing logic.

Another approach is to make this a tooling feature. An "autofix" feature could be added to fix untyped parameters based on their usage inside the function. This could speed up fixing "no implicit any" migrations.

An "autofix" feature could be added to fix untyped parameters based on their usage inside the function. This could speed up fixing "no implicit any" migrations.

That reminds me of Flow's existential types, which is a sort of opt-in H-M prover. I imagine it's a lot of work to build this if it's just some optional feature, though (as opposed to feature everyone would use). The very fact that it's opt-in (while the default type is still any) signals that it may not always be helpful.

That reminds me of Flow's existential types, which is a sort of opt-in H-M prover

It's not like H-M in any way

The very fact that it's opt-in (while the default type is still any)

This is not true

@vkurchatkin Can you explain? This is how I understand the docs I linked.

@bcherny If you omit type annotation it behaves kind of like *, i.e. type is inferred

@RyanCavanaugh There is a simple mechanism for producing sound (but rarely useful) signatures for intersections and unions of functions, described in https://github.com/Microsoft/TypeScript/issues/14107#issuecomment-296265761. It would work for your example with g, and would infer a as number | string | boolean and b as string & number & boolean. This is a totally valid use of fs supported API, but is unfortunately useless because in this instance no one can actually supply a string & number & boolean (except by casting).

In reality g should be inferred as having an identical overloaded signature to f, but it requires some thought to figure out some sensible rules from which this behavior emerges. I am confident it isn't impossible to do "intuitive" type inference, even in a type system with the constraints TypeScript is under.

@RyanCavanaugh For the following example:

declare function swapNumberString(n: string): number;
declare function swapNumberString(n: number): string;

// Pass-through function
function subs(s) {
  return swapNumberString(s);
}

// This stops being an error if you comment
// out the call in g() below. ??
const s: string = subs(12);

function g() {
  subs("");
}

The ask is for subs to be inferred as having the overloaded type ((s: string) => number) & ((s: number) => string), based on the usage of s within the body of subs.

If you only add this feature, and nothing else, how would this give rise to the "spooky action at a distance" problem you are describing? Both subs("") and subs(12) are valid uses of such an overloaded function.

Similarly, for the following example:

function fn(x) {
  bar(x);
}
function bar(y) {
  baz(y);
}
function baz(z) {
  return Math.sqrt(z);
}

The inference would be non-local only in the sense that the return type inference is non-local. Just as an inferred return type may affect the inferred return type of a caller, the inferred parameter type of a function may affect the inferred parameter type of a caller. The inference is in unidirectional (barring edge cases like recursion), and in inverse direction to the call graph.

To avoid complicated edge cases, you could start with inferring types for parameters that:

  • don't already have type constraints
  • are only used once
  • are used in positions that do not involve interaction with overloaded functions

It is fine to bail out and infer any for any cases for which a reasonable inference strategy is currently unknown, at which point a user who has noImplicitAny enabled will go through the usual rigmarole.

I've made a rudimentary attempt at this in https://github.com/Microsoft/TypeScript/compare/master...masaeedu:master that makes the case in the OP (and some others I am interested in) work. The compiler can build itself, although not all tests pass.

The algorithm is as follows. For all parameters that cannot otherwise be typed, an additional step is tried before bailing out and inferring any:

  1. The function body is searched for invocation expressions that pass the parameter as an argument
  2. When such an invocation is found, we attempt to determine the type of the corresponding parameter in the invoked function's signature (which may result in recursion)
  3. If a type is determined, we add said type to a list of "usage types" of the parameter
  4. Finally, we infer the parameter to have an intersection of all its "usage types".

Obviously this is nowhere near complete, and the implementation does not even fit with the codebase style, but it is something concrete to play with.

Some simple work to add to this:

  • Look for other kinds of "usage" of the parameter, such as assignment to well-typed references, property access, use as a function with well-typed parameters, etc.

Some hard work to add to this:

  • Work out all the inevitable hairy edge cases, none of which I've thought about, and bail out in some sensible and graceful way (usually just return undefined, which results in the parameter being anyly typed as usual)
  • Deal with cycles properly (the algorithm is identical to the stack based one for determining the function's return type, just need to work it into the currently naive implementation)
  • Deal with invocation expressions involving overloaded functions by inferring an overloaded signature for the current function with respect to the used parameter
Was this page helpful?
0 / 5 - 0 ratings

Related issues

jbondc picture jbondc  路  3Comments

dlaberge picture dlaberge  路  3Comments

Antony-Jones picture Antony-Jones  路  3Comments

manekinekko picture manekinekko  路  3Comments

seanzer picture seanzer  路  3Comments