Dependent types have been brought up before in a few places: #366, this python-ideas thread, and most recently, #3062. The latter limits itself to one specific feature of a dependent type system, namely, support for simple literal types.
Another limited feature which could prove helpful is support for _integer generics_ 鈥攖hat is, the ability to parametrize types by integers as well as other types. For example, the length of a homogeneous tuple could be parametrized:
# Currently supported:
Tuple[int, int, int] # A tuple of 3 ints
Tuple[int, ...] # A tuple of any number of ints
# Proposed:
Tuple[int, 256] # A tuple of 256 ints
One real-world use-case for this came up in python/typing#221:
# Currently supported:
SboxType = Tuple[
Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int],
]
# Proposed:
SboxType = Tuple[Tuple[int, 16], 8] # A tuple of 8 tuples of 16 ints each
Integer genericity could also be used for things like statically checking vector or matrix dimensions:
N = TypeVar('N', numeric=True) # one idea for denoting type-level integers
class Vector(Generic[N]):
def __init__(self, values: Iterable[float, N]) -> None:
self.value = tuple(values)
def __add__(self, other: Vector[N]) -> Vector[N]:
return Vector(a + b for a, b in zip(self.values, other.values))
# ...more code...
def matmul(a: Matrix[N, M], b: Matrix[M, D]) -> Matrix[N, D]: ...
Note that the Rust community has been discussing this for some time (as well as the more general _const generics_) and has a lot of material collected on the motivation, challenges, and details of such a feature:
Whether or not it's a good fit for Python is an open question, of course, but I figured I should at least bring it up for discussion! (Especially since it could interact with #3062 in interesting ways.)
_(Even nicer would be to support simple type-level expressions so we could do things like this..._
# Vector concatenation (relevant: http://stackoverflow.com/q/43292197/1974654)
def concat(a: Vector[N], b: Vector[M]) -> Vector[N+M]: ...
# The `zip` builtin (based on the typeshed stub)
@overload
def zip(iter1: Iterable[_T1, N1],
iter2: Iterable[_T2, N2]) -> Iterator[Tuple[_T1, _T2], min(N1, N2)]: ...
# HDL-like bit vector operations
def add(a: BitVec[N], b: BitVec[M]) -> BitVec[max(N, M) + 1]: ...
_...but that may be taking it too far. To avoid running arbitrary user code the expression language would have to be explicitly defined and simple enough to implement in languages other than Python; it may be feasible to support something like +, -, , //, %, *, max(), min(), and abs(), though.)_
Thank you for interesting proposal and for research! I really like the idea of supporting fixed size arrays/tuples/matrices. I have seen many people already playing with combining mypy and numpy, but in my experience many errors in numpy are array shape mismatches, something like:
ValueError: operands could not be broadcast together with shapes (1000,) (999,)
or
ValueError: operands could not be broadcast together with shapes (250, 10) (10, 250)
The point is that in most cases the array shapes/sizes are known _statically_, so that all these errors can be caught by simple integer dependent types.
Tuples could easily be special cased in mypy so they aren't enough justification by themselves for this feature. I agree with @ilevkivskyi that numpy could be an interesting use case. The mypy core team currently is not planning to work on numpy support, however -- though I understand that many potential users could find it useful.
A possible strategy would be so start from a relatively small project (if/when someone will have time for this). For example have some special support for numpy.ndarray (or maybe have a special mypy_extensions.ndarray type). We don't need to support _user defined_ integer dependent types. (Similar to how we have variadic types Calable and Tuple, but we still don't allow user defined variadic generics.)
As for numpy so for tensorflow. It'd be disheartening to find out that special support for numpy.ndarray allowed you to catch shape-mismatch errors at compile time, but that support was not available for tensorflow.Tensor, which has near-identical semantics.
One way to think about multidimensional arrays are as a function from indices to a value. The "shape" really just constrains the possible inputs to the functions. I am playing with implementing the Mathematics of Arrays in Python and I can represent an array object as a tuple of an indexing function and a shape tuple. But this doesn't capture the relationship between these two parts at the type level.
If we also want to track the inner types in the arrays, this requires a generic array object. So at the type system, I want to be able to represent an array object of of type T and shape S as "A callable with integer args <= S and output type T".
It looks like this could be accomplished with generic protocols, like this:
T = TypeVar('T')
class Array(Protocol[T]):
def index(self, *ix: int) -> T: ...
shape: List[int]
An alternative way of doing this is to just use my own type annotations that can't be typed statically and just use them at runtime. This is what the Relay project does, which is part of the TVM ML stack from U of Washington:
@relay_model
def lenet(x: Tensor[Float, (1, 28, 28)]) -> Tensor[Float, 10]:
conv1 = relay.conv2d(x, num_filter=20, ksize=[1, 5, 5, 1], no_bias=False)
tanh1 = relay.tanh(conv1)
pool1 = relay.max_pool(tanh1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1])
conv2 = relay.conv2d(pool1, num_filter=50, ksize=[1, 5, 5, 1], no_bias=False) tanh2 = relay.tanh(conv2)
pool2 = relay.max_pool(tanh2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1]) flatten = relay.flatten_layer(pool2)
fc1 = relay.linear(flatten, num_hidden=500)
tanh3 = relay.tanh(fc1)
return relay.linear(tanh3, num_hidden=10)
@relay
def loss(x: Tensor[Float, (1, 28, 28)], y: Tensor[Float, 10]) -> Float:
return relay.softmax_cross_entropy(lenet(x), y)
@relay
def train_lenet(training_data: Tensor[Float, (60000, 1, 28, 28)]) -> Model:
model = relay.create_model(lenet)
for x, y in data:
model_grad = relay.grad(model, loss, (x, y))
relay.update_model_params(model, model_grad) return relay.export_model(model)
training_data, test_data = relay.datasets.mnist() model = train_lenet(training_data)
print(relay.argmax(model(test_data[0])))
They just disable static type checking for those types.
This is a bit of a tangent, but I attended some of PLDI this year and there was a ton of talk there about the Python type system and language. Lots of people were building these computational frameworks in Python for probabilistic programming or deep learning and were coming from other languages, likes Haskell, Rust, or Java. I am not that familiar with the community there, but it seems like it would be good if some of that energy could be focused on research of Python's core so that these types of extensions to the language could be better supported (have reasonable error messages, static type checking, etc).
OT: please don't use images of text. There's no reason you couldn't have copy-pasted the original code.
Duly noted. I updated the post with code instead of the image.
It looks like this could be accomplished with generic protocols
Could you explain a bit more about how this would work? I'm afraid I don't see how generic protocols could handle shape/size constraints at the type level... out-of-bounds checking would still need to be handled at runtime in the index method, no? (In contrast, integer generics would allow something much closer to what the Relay project is doing.)
For Rust the accepted RFC is:
https://github.com/rust-lang/rfcs/pull/2000
And further improvements:
https://github.com/rust-lang/rust/issues/76560
Currently a typing problem above could be handled like this:
T16 = Tuple[int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int]
SboxType = Tuple[T16, T16, T16, T16, T16, T16, T16, T16]
Is a case like this considered as well?
def get_ngrams(tokens: Iterable[str], n: int = 2):
"""
>>> get_ngrams(["my", "name", "is", "archibald"], n=2)
[('my', 'name'), ('name', 'is'), ('is', 'archibald')]
"""
grams = zip(*[tokens[i:] for i in range(n)])
return grams
PyCharm infers Union[Iterator[Tuple[Any]], Iterator[Tuple[Any, Any]], Iterator[Tuple[Any, Any, Any]], Iterator[Tuple[Any, Any, Any, Any]], Iterator[Tuple[Any, Any, Any, Any, Any]]] as the return type (for some reason stopping at five elements).
As it is now, I don't think I can do better than Iterator[Tuple[str, ...]] which leaves out the length information.
Hi! I have been working on a proposal for adding support for type arithmetic and some other useful features for tensor typing. I will present it next Monday in the next chapter of the Tensor Typing meetings that we have been organising, so if anyone is interested feel free to attend: link
Hi! I have been working on a proposal for adding support for type arithmetic and some other useful features for tensor typing. I will present it next Monday in the next chapter of the Tensor Typing meetings that we have been organising, so if anyone is interested feel free to attend: link
Looking forward to it!
Hi! I have been working on a proposal for adding support for type arithmetic and some other useful features for tensor typing. I will present it next Monday in the next chapter of the Tensor Typing meetings that we have been organising, so if anyone is interested feel free to attend: link
@fylux Is there a write-up of your proposal available anywhere?
@luphord The write-up is not ready yet but the slides of the presentation can give you a good idea: link
@luphord I hope eventually some of this stuff will also go to statically typed languages as Rust :-) Rust "const generics" aren't enough.
@fylux Oh wow, that is a lot more advanced than what I expected, thanks!
I was merely thinking of simple use cases such as (in "numpy terminology")
import numpy as np
# n, k generic integers
def f(A: np.ndarray[n, k], x: np.ndarray[k]) -> np.ndarray[n]:
return A @ x
and then mypy would recognize a type error in
A = np.random.random((3,4))
x = np.random.random((5,))
f(A, x) # <- would cause a type error in mypy
However, I can clearly see the benefits of type arithmetic that you presented. Is there a road map for your proposal? I understand that at least parts of it are already implemented in pyre?
Regarding support for "simple cases", that is already possible in Pyre thanks to variadics (e.g. XLM example), and there is currently an open proposal for standardising it in Python (https://mail.python.org/archives/list/[email protected]/thread/SQVTQYWIOI4TIO7NNBTFFWFMSMS2TA4J/).
About type arithmetic, +, -, //, *, Length and Product are all already supported in Pyre (e.g MaxPool2D example).
The final proposal for type arithmetic is not yet written, mostly because type arithmetic will highly benefit from having support for variadics, so until the variadic PEP is merged there is not so much sense on discussion a type arithmetic PEP.
Most helpful comment
@luphord The write-up is not ready yet but the slides of the presentation can give you a good idea: link