Mypy: Project ideas

Created on 5 Feb 2020  路  28Comments  路  Source: python/mypy

Here are some ideas for larger mypy-related projects for contributors who want to tackle something fairly big (but also with a big potential impact).

Deep editor integrations

Currently it's possible to run mypy daemon from an editor and display the list of errors within the editor, but we could go much further. Possible ideas include going to the definition of an arbitrary reference (such as a method, variable, type, etc.), and displaying the inferred type of an expression. IDEs such as PyCharm can do some of this already, but mypy could support these features more reliably in some cases, since it maintains a very detailed representation of the program internally. Also, this could be very helpful with editors that have no or limited built-in support for these features.

Better decorator support

Mypy can't properly support decorators that change the return type of the decorated function, such as contextmanager (contextmanager is special-cased using a plugin, but this approach doesn't generalize to arbitrary functions). Add support for PEP 612 draft to make this better.

Generalize mypyc IR to allow non-C backends

Currently the IR of mypyc, the compiler we use to compile mypy, is tightly bound to C. This makes it impractical to experiment with alternative backends, such as an LLVM back end or a completely custom back end that directly generates assembly.

Related issue: https://github.com/mypyc/mypyc/issues/709

Faster callables and nested functions in mypyc

Currently calling nested functions and variables with a callable type is pretty slow in compiled code. These limitations reduce the usefulness of mypyc significantly, especially when compiling code that wasn't originally written with mypyc in mind.

Related issues: https://github.com/mypyc/mypyc/issues/713, https://github.com/mypyc/mypyc/issues/712 (both of these would be implemented in a GSoC project)

NumPy support

This is a big topic, but this can be approached a feature at a time. One of the main missing things is "shape types" -- there needs to be a way to express the number of dimensions in an array, at the very least.

Most helpful comment

To support NumPy, we'd need some form of "shape type" support. We want to specify the number of dimensions and the item type of an array, at least. Even better, it would often be useful to specify the exact size of an array, but this will be much harder to implement.

This example is from @ilevkivskyi's presentation from the Typing Summit at PyCon 2019:

from typing import Shape, IntVar, TypeVar

N = IntVar('N')
T = TypeVar('T')

def diff(a: ndarray[T, Shape[N]]) -> ndarray[T, Shape[N - 1]]:
    ...
def sum2d(a: ndarray[T, Shape[:, :]) -> ndarray[T, Shape[:]]:
    ...

This means that diff takes a one-dimensional array of size N and item type T, and it returns a one-dimensional array of size N - 1 and item type T. sum2d, on the other hand, takes an arbitrary-sized two-dimensional array with item type T, and it returns a one-dimensional, arbitrary-sized array with item type T.

A reasonable goal for GSoC would be to support specifying the item type and the number of dimensions (e.g. ndarray[float, Shape[:, :]] for a two-dimensional array). This would involve at least these steps (I'm leaving a lot of details out):

  1. Add support for the new shape type syntax.
  2. Implement basic type operations involving shape types, such as displaying shapes in error messages and subtyping.
  3. Implement simple stubs for NumPy that use the shape types.
  4. If needed, add a mypy plugin to handle some common NumPy operations that can't be supported via existing type system features.

This is quite a challenging project and requires a deep understanding of mypy type checking internals, so it's really only well-suited for somebody who has substantial experience with working on mypy (or another type checker), or perhaps has completed a course on type theory (beyond a compilers course).

All 28 comments

I'd be interested in adding support for PEP 612. I'll take a look at the pyre implementation this weekend, re-read the PEP, and scope out a plan.

@hauntsaninja Great! If you have any questions, I'm happy to help.

Another idea:

Detecting potentially undefined or misspelled locals

Some uses of undefined variables are not caught by mypy:

def f() -> None:
    if foo():
        x = 0
    print(x)  # No error

It would be useful to catch these (#2400). A related issue is reporting locals that are never read, as this is often an error (#76).

GSOC 2020 organizations are announced and mypy is one of them! see https://summerofcode.withgoogle.com/organizations/6527008550420480/

I am interested in Generalize mypyc IR to allow non-C backends and I will manage to have a plan by the end of February.

Hi!
I'd be interested in working in detecting potentially undefined or misspelled locals.
I will work out a plan and get in contact soon...

Hi!!
I am Sarvesh, I am interested in Faster callables and nested functions in mypyc and Numpy support.
I am working on Faster Callables and will get back soon with some solutuon.

Hello,
My name is Jonathan, I am actually focus on data science and I am really interested to work in Numpy Support. I once questioned regarding the unavailable of "shape types" . By working this project, will help my Data Science works too.

I am going to read the documentations and try to find out some insights which features can be improve for numpy. Thanks

Cheers,
Jonathan

Hi, @edwinjon and @dubesar, glad to see someone interested in working on the mypy-Numpy topic!

As far as I know, there's a similar work that may help you develop your ideas and plans, check https://github.com/numpy/numpy-stubs.

@JukkaL By numpy support I understand that the codes written in standard format needs to be changed to numpy format as for example changing arrays to numpy arrays.
Also by shapes you mean the shapes of the arrays?

Please elaborate that part!!

Also the Faster Callables part I checked the running time of the code and have addressed it in the issue. Please guide me further what has to be done in that, so that I can move in further with the work.

Hi @JukkaL my name is Eslam Genedy , I am at third year in University ASU Computer Engineering Department , it is my pleasure to apply for mypy . I have a good knowledge in oop in python and numpy . finally I hope to be contributed in this project thanks in advance

To support NumPy, we'd need some form of "shape type" support. We want to specify the number of dimensions and the item type of an array, at least. Even better, it would often be useful to specify the exact size of an array, but this will be much harder to implement.

This example is from @ilevkivskyi's presentation from the Typing Summit at PyCon 2019:

from typing import Shape, IntVar, TypeVar

N = IntVar('N')
T = TypeVar('T')

def diff(a: ndarray[T, Shape[N]]) -> ndarray[T, Shape[N - 1]]:
    ...
def sum2d(a: ndarray[T, Shape[:, :]) -> ndarray[T, Shape[:]]:
    ...

This means that diff takes a one-dimensional array of size N and item type T, and it returns a one-dimensional array of size N - 1 and item type T. sum2d, on the other hand, takes an arbitrary-sized two-dimensional array with item type T, and it returns a one-dimensional, arbitrary-sized array with item type T.

A reasonable goal for GSoC would be to support specifying the item type and the number of dimensions (e.g. ndarray[float, Shape[:, :]] for a two-dimensional array). This would involve at least these steps (I'm leaving a lot of details out):

  1. Add support for the new shape type syntax.
  2. Implement basic type operations involving shape types, such as displaying shapes in error messages and subtyping.
  3. Implement simple stubs for NumPy that use the shape types.
  4. If needed, add a mypy plugin to handle some common NumPy operations that can't be supported via existing type system features.

This is quite a challenging project and requires a deep understanding of mypy type checking internals, so it's really only well-suited for somebody who has substantial experience with working on mypy (or another type checker), or perhaps has completed a course on type theory (beyond a compilers course).

@JukkaL , I'd like to integrate PEP 612 and take the better decorator support

@joybhallaa That would be a really useful project! I'd suggest starting by closing a few smaller issues first to get familiar with working on mypy. (The same advice applies to anybody who's interested in diving into a major project. It's best to gradually learn the codebase. Otherwise these's a big risk that there's too much to learn at once, and you'll get discouraged.)

Since PEP 612 involves type inference and type variables, tackling one (or some) of these issues could be a good option: https://github.com/python/mypy/issues?q=is%3Aopen+is%3Aissue+label%3Atopic-type-variables

If you find a promising issue but you are not sure where to start, you may want to ask for hints in the issue.

Since PEP 612 involves type inference and type variables, tackling one (or some) of these issues could be a good option: https://github.com/python/mypy/issues?q=is%3Aopen+is%3Aissue+label%3Atopic-type-variables

@JukkaL I've taken up an issue and after I'm finished with that I'll take up issues that you have mentioned. :+1:

Hi @JukkaL I am interested in contributing for adding better decorator support, RIght now I am working on the issues , is there anything which should be done parallely with this and are there any potential mentors for this project idea?

As a heads up, we will probably have the bandwidth to mentor one or two GSoC projects. We won't make any final decisions on proposals until officially reviewing the applications.

@JukkaL Im posting here to inform you that me and 3 other students from the TU Delft (Netherlands) are currently involved in analyzing mypy from a Software Architecture perspective (university course).

We are currently writing multiple essays publicly visible on https://desosa2020.netlify.com/projects/mypy/
And our first essay (merely an introduction):
https://desosa2020.netlify.com/projects/mypy/2020/03/04/the-vision-behind-mypy
3 more will follow soon!

Now my question is:

  • Is there any low-hanging fruit for us to get familiar with the code-base?
  • Is there something can we contribute to this great project from an software architecture point of view? (Documentation, analysis, bugs or advice)
  • And finally, would you be interested to see the results of our architectural analysis?

@davidzwa For a link of good first issues to fix, you may want to check out https://github.com/python/mypy/issues?q=is%3Aissue+is%3Aopen+label%3Agood-first-issue+-linked%3Apr+no%3Aassignee

Also mypy is usually lowercase, unless it starts a sentence, FWIW. I'll leave the other questions to Jukka though :)

@JukkaL and @msullivan , I'm having fair knowledge in Python and solved several problems using Data structures and Algorithms concepts. I'm new to GSoC, would need inputs to kick start contributing in this.

@JukkaL I would like to pick up NumPy support as my project. I have already made myself familiar with mypy and have decent experience working with Numpy. Besides creating a way to define array dimensions, we could also add checks for matrix and vector operations. I worked on issue: #8344 , the pull request is yet to be reviewed though.

Hey guys i want to help in set up mypy for contributing

Many people have expressed interest in the GSoC projects. Since we can't accept many candidates, we are setting some minimum requirements for candidates to streamline the process, and to set realistic expectations. These are the requirements (no exceptions, sorry):

  1. You have a) completed a university course "Introduction to Compilers" (or similar), or b) have made multiple contributions to a type checker or compiler project (this can be mypy or other open source project).
  2. You can demonstrate practical Python experience. For the mypyc-related projects, you have experience with C or C++ as well.
  3. You've been able to create a non-trivial mypy PR and respond to reviews. (It's not necessary that your PR is merged, though that's a plus.)
  4. Contact me ([email protected]) by email and give these details:

    • A summary of your compilers-related experience. If you finished a compilers course, give a summary of what you learned and a description of your programming project (if any). If you have contributed to a compiler or type checker project, provide links to your contributions and/or describe what you've done and what's the significance of your contributions. Give a link to your mypy PR or PRs.

    • A summary of your Python experience (and C/C++, if relevant). Provide a short summary of one or two projects you've worked on in the language(s). If you can give links to code, even better.

    • Tell us which of the project ideas would you like to work on, and why.

    • (Optional but recommended) Other supporting evidence, such as other courses you've finished that may be relevant, your grades, side projects you've worked on, web sites you've created, other languages you are skilled at, etc.

As I mentioned above, the NumPy project is more challenging than the others, and thus we'd need a candidate with substantial previous experience. A handful of mypy PRs won't be enough experience to work on NumPy with a good chance of success, if that's all the type checker related experience you have.

Notes:

  • Knowledge of algorithms and data structures is also necessary, but it's not sufficient.
  • We'll pick a few of the most promising candidates and schedule interviews and/or a small coding tasks. These can happen over Skype or chat, for example -- we are quite flexible.
  • Unfortunately, we don't have bandwidth to personally help everybody get their mypy development environment set up. Getting started using the available documentation is one of the first tasks you need to be able to complete on your own.
  • Reading a book is not a substitute for attending a compilers course, since the coding assignments in typical courses are very important.
  • For generic questions about getting started, read this thread, our information at the GSoC site, and our readme (https://github.com/python/mypy/blob/master/README.md). We expect that suitable candidates have enough experience to be able to create a PR with the available help.
  • If you have specific questions, feel free to email me or post here.

Clarification: The Faster callables and nested functions in mypyc GSoC project includes implementing both of the linked issues.

Note that Detecting potentially undefined or misspelled locals feels too small for a full GSoC project. If somebody wants to work on this topic in the context of GSoC, we can expand the scope by also detecting (some) uninitialized attributes, and preventing unsafe attribute deletions. I can provide more context if there is interest.

Hello ,

Can you give a bit of details for numpy support ?

@shadaabghani1 It would be much easier me to help you, if you can ask more specific questions, as the topic is pretty wide ranging. I already gave an overview in this comment: https://github.com/python/mypy/issues/8373#issuecomment-591581607

@JukkaL I don't know if for the numpy support point you could be interested in this thread: https://llvm.discourse.group/t/numpy-scipy-op-set/768

Currently I'm writing sphinx docs like this and I wish there was a sphinx plugin or something that would ask mypy what the type of each attribute is:

    .. attribute:: callback
        :type: Callable[[], None]

        Blah blah. Boring text goes here.
Was this page helpful?
0 / 5 - 0 ratings