Mypy: Proposal: use binder (or similar) to track which variables are defined.

Created on 27 Sep 2017  路  16Comments  路  Source: python/mypy

I couldn't find an issue for this, so here is an idea. Currently this passes mypy:

class C:
    x: int
c = C()
c.x

y: str
y

but obviously fails at runtime. Binder successfully tracks execution frames and type narrowing by assignments, it looks like it can also track each Var whether it is still just "declared" or already defined. So that we can show errors like Variable 'var' may be uninitialized here.

EDIT: updated first example

feature needs discussion priority-1-normal

All 16 comments

I can see how the second example can work, but how would the first one? It requires kind of type state or something sophisticated like that

but how would the first one? It requires kind of type state or something sophisticated like that

I think it will be not more complex than the second one. Every attribute has its Var in the class names symbol table. The state will be tracked per Var. IIUC currently binder tracks expressions, not Vars, but it seems not too hard to also track Vars.

I still don't get it. Maybe I misunderstood the example. How do you know that the field is not initialized?

Ah, sorry, the first example is indeed bad, since CallExpr is not bindable. It should look like this:

class C:
    x: int

c = C() # from analyzing the class body we know that 'C' instances are created
        # with field 'x' uninitialized. It is not always 100% clear, in uncertain cases we
        # assume it is initialized, but in this case we know it is not.

c.x  # this is an error therefore
c.x = 1  # now binder records that name 'x' of NameExpr('c') is initialized

It is almost always not 100% clear - method calls, any transfer of control, metaclasses a-la NamedTuple, etc.

It is almost always not 100% clear - method calls, any transfer of control, metaclasses a-la NamedTuple, etc.

My experience is quite opposite. Many classes have trivial __init__s (just look at mypy code, currently however many things are actually "initialized" to None because we can't use PEP 526, but this doesn't count, I propose to treat them as uninitialized), also we can special-case known popular synthetic types like NamedTuple and (in future) dataclasses. Also for method calls we can record which names are set by each method, so that we can track the following:

class C:
    x: int
    def start(self) -> None:
        self.x = 0

c = C()
if condition:
    c.start()
    # we know that 'c.x' is initialized here

# but not here, unless condition is know as always True statically.

We also will need to track function calls, this is true, but it is not super hard. Note, that at the end of the above example c.x may be initialized or may not, depending on condition, this is why the error says: "c.x" may be uninitialized here. The point is that if c.x is attempted to be read there, then it is not _always_ safe.

I think that this is a volatile path to take - tracking effects type-state using inferred procedure summarization (you will need syntax in order to check libraries; this will be similar to type guards); we don't infer return types, even though it will be more useful and will involve less specialized machinery.

To be clear, I do think it will be an interesting thing to try; it's just that the high level of added complexity (to parts that are already too complicated) and effort does not seem to be worth the relatively weak guarantees, the number of false positives, and the "action from distance" involved in tracking function calls.

In other words, tracking field initialization means adding typestate-checking to mypy. It looks like a big leap from what we have right now. If we choose to do it, I'd vote for adding the more general mechanism, so it can be instantiated with other analyzes such as resource handling.

Fine-grained tracking of whether attributes are defined goes against the philosophy of mypy. In particular mypy tries hard not to do non-local static analysis. I agree with @elazarg's reasoning, and it also makes incremental type checking harder. Doing this for regular variables is another thing entirely and doesn't have the same problems.

A more pragmatic approach would be to allow declaring that certain attributes (or all of them) must always be explicitly defined in __init__. Strawman example:

class A:
    x: int
    y: str

    def __init__(self) -> None:
        self.x = 0  # Complain about y not being defined here

    def f(self) -> None:
        self.z = 1  # Also complain about an attribute defined here

a = A()
del a.x  # Not allowed

We could probably implement this in a way that we'd have reasonable guarantees that attribute access can't fail for instances of A (usual caveats about introspection, Any types, etc. apply of course). Not sure how we'd enable this -- maybe through a config file option. I might want something like this in not-too-distant future.

OK, let us do this step by step. We can start from tracking global/local variables, then enable a flag to force all definitions in __init__, and then we will see.

Requiring definitions in __init__ is going to be behind an option, yes? I don't like the idea of it making it required by default.

@ethanhs Yes

I think it should be required by default. From PEP 526 (emphasis mine):

In particular, the value-less notation a: int allows one to annotate instance variables that should be initialized in __init__ or __new__.

So the PEP clearly intends that the annotated instance variables are initialised inside __init__ (or __new__). And I guess since this is fairly new syntax there wouldn't be much code in the wild that would break by requiring this by default.
The other thing, which I mention in the closed #4206, is that adding PEP 526 style instance variable annotations to an existing class can actually cause mypy to be silent where it would otherwise correctly throw a warning, making the annotations more dangerous than useful.

It would be much better to check if any instance variables are left uninitialized than to check if all are initialized in __init__ or __new__, as to me the PEP snippet reads as a recommendation.

Fair enough, that makes sense in the spirit of not wanting to warn about correct programs.

See https://github.com/python/mypy/issues/7756 for an important use case (problematic __init__).

this would be a very cool feature for mypy

Was this page helpful?
0 / 5 - 0 ratings