Numba: Suggestion: read jit types from annotations

Created on 12 Apr 2017  路  19Comments  路  Source: numba/numba

The @jit allows supplying expected types

@jit(int32(int32, int32))
def f(x, y):
    return x + y

Function type annotations in Python 3.5+ look like a good fit for this use case, annotations are easy enough to read from the decorated function properties , allowing you to write:

@jit
def f(x: int32, y: int32) -> int32:
    return x + y

With the added bonus that the same type annotations could improve IDE support (e.g. completions and error checking in PyCharm) and mypy tests.

This might not cover every use case, but could probably cover a lot of common cases, including when several types are accepted using Union[]

feature_request

Most helpful comment

The same applies to @jitclass:

Instead of

@jitclass([('bar', int32)])
class Foo:
    pass

you could have

@jitclass
class Foo:
    bar: int32

Now that @dataclass is a part of Python 3.7, this would make even more sense consistency-wise.

All 19 comments

This will definitely be a post 1.0 feature.

@seibert I'd be willing to help give this a shot. It looks like it'd just be a matter of parsing the signature like @mangecoeur said, which I guess would happen right in here: https://github.com/numba/numba/blob/master/numba/decorators.py#L154-L170

One thing I'm curious to know is how we should represent NumPy array types in annotations. Numba specializes the code it generates on 3 array attributes:

  • Number of dimensions of the array
  • Element layout: C order (right most index has stride 1), FORTRAN order (leftmost index has stride 1), "any" order (arbitrary strides)
  • Element dtype

In order for Numba to compile the most efficient code at module load time (which I assume is the goal here), that level of detail should be specified for the input types. We could skip the element layout attribute and accept some loss of performance and compile assuming the "any" layout. (For example, I'm pretty sure that SIMD instruction usage is severely limited unless we know we have C or FORTRAN order arrays.)

I think it would also be important to be able to tell Numba to ignore the annotations for the purposes of compilation. That way they could still be used for static type checking tools and IDEs.

I was envisioning a new kwarg signature, and the developer would have to explicitly opt into signature checking with jit(signature=True). That would also prevent any kind of new inference method from breaking legacy annotated code. So you'd have to write (forgive any off-by-one errors, I'm doing this on my phone):

from typing import Tuple, Union, Iterable
import numba as nb
import numpy as np


Matrixlike = Union[np.ndarray, np.matrix, Iterable[Iterable[float]]]


@nb.jit(signature=True)
def _row_similarity(X: nb.float64[:,:]) -> nb.tuple(nb.float64[:], nb.float64[:]):
    n = X.shape[0]
    norms = np.empty((n,))
    for i in range(n):
        norms[i] = np.sum(X[i]**2)

    sims = np.empty((n*(n-1)/2,))
    k = 0
    for i1 in range(n):
        for i2 in range(i1+1, n):
            sims[k] = np.dot(X[i1], X[i2]) / norms[i1] / norms[i2]
            k += 1

    return sims, norms


def row_similarity(X: Matrixlike) -> Tuple[np.ndarray, np.ndarray]:
    if not isinstance(X, np.ndarray):
        X = np.array([list(row) for row in X])
    X = np.ascontiguousarray(X, np.float64)
    return _row_similarity(X)

It still makes sense to use Numba types, rather than Numpy types. You can't yet specify Numpy array dimensions in annotations. AFAICT that is an unsolved problem.

The problem is that Numba types don't inherit from builtin types _or_ Numpy types, so Mypy (and every type checker that depends on it) would be very unhappy with these signatures.

Nevertheless, it does lead to much cleaner code in my opinion. Since it's opt-in in anyway, would there be any harm in doing it, knowing that actual type checkers would not be able to handle it?

The truly correct answer of course, would be to write a PEP 544 Protocol class for the Numpy ndarray API, make thr Numba types inherit from that Protocol, and then convince people to use that in their annotations. That's a bigger issue than Numba, but personally I think it would be a good thing for even non-Numba users, and I don't think there'd much adoption friction.

I don't want to encourage users to annotate function types. I think the current numba type system is very limiting and it should be replaced. With type annotation, there is an implicit cast that is very unpythonic. Users can always put explicit casts inside the function body. That way, the function definition will still work without numba.

The only benefits function type annotation provides are forced compilation at decoration time. However, if controlling and reducing compilation overhead is important, users can always enable caching.

@sklam What are your goals/vision for the new type system? Will it integrate with function/variable /class annotations?

Would we then be able to type functions with abstract types for multiple dispatch?

Cython has done a similar thing: https://github.com/cython/cython/issues/1672. They also discus the tradeoffs of using the type annotations to change behavior (which is what PEP 484 says).

Eventually, they decided to use type annotations to compile functions (which would change Python's behavior). They eventually decided to use the type annotations with Cython, but I'm not sure if their motivation applies here. Their discussion starts here: https://github.com/cython/cython/issues/1672#issuecomment-295919502

The same applies to @jitclass:

Instead of

@jitclass([('bar', int32)])
class Foo:
    pass

you could have

@jitclass
class Foo:
    bar: int32

Now that @dataclass is a part of Python 3.7, this would make even more sense consistency-wise.

This would also be useful for variable annotations, so that the locals= dictionary could be avoided (which is ugly and necessary if you are using float32's). (PEP 526 and part of Python 3.6+)

I did some checking on this last week and discovered (to my disappointment) that variable annotations inside of a function are effectively comments. They don't show up in any of the function attributes (unlike the argument and return type annotations) and they do not show up in the bytecode. As a result, the Numba compiler can't see them unless we start trying to locate function source code and parse it again (which has its own issues).

If there is a way to obtain the local variable annotations programmatically that I've missed, please let me know!

@seibert MyPy might have some magic that could help.

I feel like Python ought to have programmatic access in this case. Maybe a PEP is in order? I could see some sort of data structure in which type declarations/annotations are indexed by their equivalent positions in bytecode, but that would only apply for CPython AFAIK.

For what it's worth I feel like reading this information from in-line type annotations (as opposed to function signatures) is out of scope here. It's not like you can currently declare a typed variable in the middle of a Numba method, right?

The current behavior seems intentional, based on this discussion: https://github.com/python/typing/issues/258

Also, our reluctance base a Numba feature on source code comes from the difficulties in obtaining function source (or equivalent ASTs) in all of the situations where Numba can be used:

  • Python source files (easy, inspect.getsource())
  • Jupyter Notebooks (less easy, and sometimes fragile)
  • Functions constructed from string templates that are exec'ed on the fly
  • Cloudpickled functions sent over the network
  • Functions imported from .pyc files (not that common, but currently works)

@seibert as I understand it at the moment numba does not actually support any kind of variable type annotation inside a function (?) In which case the function argument/return types and class variable types could still be equivalent to using signature/class type definitions in the jit decorator, since those annotations are included in the __annotations__ variable. It's only variable annotations inside functions/methods which don't appear anywhere.

def test(foo: float):
    bar: int

test.__annotations__

>>> {'foo': float}

Yes, it is correct that Numba could use the type information from the function signature.

We haven't devoted any effort to Numba support for function argument / return type annotations in the @jit decorator for several reasons. We find that describing the types with the level of detail Numba requires is (1) quite tedious for users, (2) almost always not necessary due to type inference, and (3) very easy to get wrong.

In particular, the complexity comes from NumPy array types, which are a lot like C++ template types as far as Numba is concerned. Since Numba uses types for dispatch to different implementations, it captures the element dtype, number of dimensions, and the array layout (C, FORTRAN or "any/unknown") in the Numba type for the array. That lets Numba generate the most optimized code, and it also ensures that the type inference algorithm can figure out the return types from array slicing. Manually putting type declarations on functions is likely to lead Numba astray, or unnecessarily limit the compiler when generating customized implementations.

Currently, the only cases where explicit signatures are necessary are for ahead of time (AoT) compilation, and for @guvectorize because of implementation limitations that we want to eliminate soon. For AoT cases, we often need multiple type signatures, and it is also not clear how to express that with Python type annotations.

The ugliness of @jit(locals) (needed to override the Numba type-casting rules, usually for performance reasons) is one situation where type annotations on variables would be a perfect replacement. Unfortunately, that's also the one situation we can't use at the moment...

@seibert what about function type annotations for overloads?

It's not like you can currently declare a typed variable in the middle of a Numba method, right?

See the numba example here: https://github.com/henryiii/framework_compare/tree/master/lin_regression

If you don't set the locals=..., it's much slower due to mixed doubles and floats.

If there is a way to obtain the local variable annotations programmatically that I've missed, please let me know!

I played around with this for a while and realized that this was not available; I think it's only available class level and module level. It seems like this should have been added to the function attributes.

For python 3.6 and 3.7, class-level annotation shows up in the cls.__annotations__

In [1]: class A:
   ...:     bar: float
   ...:

In [2]: A.__annotations__
Out[2]: {'bar': float}

For python 3.6 and 3.7, class-level annotation shows up in the cls.__annotations__

And that's what collides with the current implementation of jitclass by the way! (see #2947)

Class-level annotation are easily accessible, yea. Functions and methods too:

>>> def f(x: int): pass

>>> f.__annotations__
{'x': int}

It's the locals that aren't.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dscole picture dscole  路  4Comments

mroeschke picture mroeschke  路  3Comments

monocongo picture monocongo  路  3Comments

RemiLehe picture RemiLehe  路  4Comments

hameerabbasi picture hameerabbasi  路  3Comments