We don't have a complete story for typed dicts when some keys may be missing. Support for get (#2612) is necessary, but it may not be sufficient. For example, consider code like this:
A = TypedDict('A', {'x': int, 'y': str})
a: A = {'x': 1}
Should it be possible to make this type check without a cast?
There are many reasonable use cases where some typed dict keys are only defined some of the time:
TypedDict as an argument type, later API versions may support additional keys that are optional to retain backward compatibility.A simple approach would be have a way to annotate potentially missing keys. By default, a key would still be required. We can't use Optional[...] for this since it has another meaning. Some ideas below:
A = TypedDict('A', {'x': OptionalKey[int]})
A = TypedDict('A', {'x': MayExist[int]})
A = TypedDict('A', {'x': NotRequired[int]})
class A(TypedDict):
a: OptionalKey[int]
b: MayExist[int]
This doesn't work:
A = TypedDict('A', {'x?': int}) # doesn't work with the alternative syntax
If some key is optional, it must be accessed using get() or guarded with an in check:
if 'x' in d:
d['x'] # okay if 'x' is optional
d.get('x') # okay if 'x' is optional
d['x'] # failure if 'x' is optional
Maybe we also need to track setting optional keys in the conditional type binder:
if 'x' not in d:
d['x'] = 1
d['x'] # okay even if 'x' is optional, since we set it above
When receiving JSON objects from an untrusted source, we'd want to check that it conforms to our schema before using it as a typed dict. Otherwise bad things could happen. If optional key constraints are available for introspection, we could implement a helper that checks whether a runtime dictionary object is compatible with a typed dict type. This wouldn't be part of mypy or typing, but it could be a useful separate module.
FWIW hacklang conflates keys with Nullable with Optional Keys - https://docs.hhvm.com/hack/shapes/introduction#accessing-fields__nullable-fields-are-optional
I wouldn't worry too much about schema validation in Mypy. Marshmallow is already great at that (And the hypothetical plugin system would allow them to play nice together 馃槃)
@rowillia An approach similar to Hack sounds pretty reasonable for mypy as well. Here's how it could work:
d['x'] works no matter what type 'x' has.d.get('x') is only allowed if 'x' has an optional type.This would clearly be unsafe, though. d['x'] could fail at runtime, in case 'x' is optional. Also, if some external library or service actually expects an explicit None value, it's easy to accidentally omit a required key and cause a runtime error.
Maybe mypy should be able to help catch these issues. Some ideas:
get (perhaps the default). 2) Optional keys can be left out, but enforce using get with optional keys. 3) Keys with optional types cannot be left out.get checks for all typed dicts. When this option is enabled, get must be used with optional keys. This is a pretty coarse-grained tool, but it might still work reasonably well.{'x': Required[Optional[int]]}. Such a key must always be present, and using get is not allowed. By default we could fall back to the Hack-like approach. Alternative spelling: ValueRequired[Optional[...]].get with optional keys (no d['x']).I'm leaning towards (4), since it seems to provide sufficient flexibility with reasonable defaults, and it would also be safe. Also, I'd probably allow get to be used with arbitrary keys to make checking legacy code easier -- after all, the get call won't fail at runtime even if a key is required.
@davidfstr @gvanrossum What do you think?
A few quick thoughts:
I don't have a lot of first-hand experience with JSON blobs that have omittable keys, so I don't have especially strong opinions on the semantics for working with them.
Okay, here's another proposal:
Optional[...] as a value type is not special.get can be used with arbitrary keys, even those that are declared to always exist. This is for convenience and it doesn't affect safety. The return type for the single-argument version of get is always an optional type, even if the key is supposed to always exist.OptionalItem[t] for keys that may be missing. This is unrelated to Optional[...].OptionalItem[Optional[t]] for an optional item that may be None.OptionalItem[...] is only valid if it wraps a typed dict value type. It's an error to use it in any other context.I chose the name OptionalItem as the name for a few reasons:
More rationale:
None value, even though the dict.get method somewhat conflates these through the default None value for missing keys.OptionalItem is a very searchable name and we can make sure all official documentation about it mentions how it's different from Optional.Additional notes:
{'x': int} and {'x': int, 'y': str} probably should be {'x': int, 'y': OptionalItem[str]} for convenience. This is not safe, but I doubt that it matters in practice. I can create a separate issue for this with some rationale if we decide to move forward with the rest of this proposal.Hm, that's pretty verbose, and many people probably don't even know whether their code cares about explicit None values vs. missing keys. How about intentionally conflating the two? The rules would be:
Optional are mandatory; d[k] is okay and has the stated type; d.get(k) is okay and has an Optional typeOptional may be None or missing; d[k] is not okay for these; d.get(k) must be used and has anOptional` typeUnion[None, T]d.get(k, default) the return type is a union of None, the type of default, and the nominal type of d[k]The one use case that this doesn't cover is when you want to allow the key to be missing, but when present you don't want the value to be None. Is that an important use case? If it is, I will withdraw this and then I am okay with the OptionalItem proposal.
My gut feeling is that a missing key with the value never being None is not rare. For example, some code I looked at today had code like d.get('x', {}).get('y', 1) which would fail if the return type of get would always include None.
Schema evolution or versioning is a typical case where it seems natural to have missing keys with non-Optional values. If we add a new item to a JSON dictionary, but we need to be prepared to accept older objects with a missing key (see above for a discussion about potential use cases), the most likely result seems to be a missing key with a non-optional type.
OK, I withdraw my proposal. IIRC @rowillia also posted some example code that used d.get(k1, {}).get(k2) so apparently this is a common assumption that we should be able to express. And for that case we would just write OptionalItem[...] so that's quite fine. +1!
I like the OptionalItem proposal, mainly on the rationale that explicit is better than implicit.
@gvanrossum I think that this would be nice to have before we make TypedDict official. I looked at Dropbox internal codebases and calls like d.get('foo', ...) were very common (over 10k hits), and I'd expect that they often imply an optional TypedDict item.
@JukkaL
Sorry for an off-topic question, but there is a very similar question for protocols, we decided to omit this for now, but I am just curious how frequent is something like if hasattr(...) in Dropbox internal codebases?
@ilevkivskyi We have around 1k instances of if ... hasattr, but some of them probably are unrelated to protocols.
@JukkaL
Thanks! This is already much less than for TypedDicts, so it looks like the decision to postpone this for protocols is justified.
We had an offline discussion about the syntax and @ddfisher wasn't happy with Optionaltem[...] because it's so close to Optional[...]. I came up with another alternative, Checked[...]. The idea behind the name is that the user must check for the existence of the item before accessing (or do it indirectly through get). It would look like this:
A = TypedDict('A', {'x': Checked[int]})
class A(TypedDict):
a: Checked[int]
Pros:
Optional[...]. It's easy to talk about checked vs. optional dictionary items as separate things in docs, error messages and such.Optional[...] (unlike OptionalItem[...] which is a noun phrase).Cons:
Checked is a less commonly used term than Optional (though there are precedents, such as checked exceptions in Java). However, the use case is also less common/typical than union-with-None.Hm, Checked[int] looks weird to me. I wonder if we could just have a flag
that makes all of a given TypedDict's items optional? The app should be
allowed to write td[key] if they're certain that the key exists, or
td.get(key [, default]) if they're not.
Having a per-TypedDict flag might be a reasonable compromise. (Note that in my current implementation td.get('key'[, default]) is always valid, but there's no way to require it to be always used.)
Here are possible semantics in more detail:
If the flag is false (this is the default), both td[key] and td.get(key[, default]) are always accepted.
If the flag is true, then td['key'] would always be rejected without an explicit 'key' in td check. td.get(key[, default]) is always valid.
Possible syntax:
T = TypedDict('T', {'x': int}, partial=True)
def f(t: T) -> None:
t['x'] # Invalid
t.get('x') # Ok
assert 'x' in t
t['x'] # Ok
f({}) # Ok
f({'x': 2}) # Ok
Other ideas for the name of the flag:
allow_partialallow_missingallow_missing_keysallow_missing_itemsmissing_keysmissing_itemsNot sure what's the best way to support the class-based syntax. We could perhaps have __partial__ = True in the class body. Alternatively, we could use a different base class such as PartialTypedDict, or another base class such as Partial. Finally, we could use a class decorator such as @partial. My current favorite is @partial:
@partial
class T(TypedDict):
x: int
Since the class based syntax works only in Python 3.6 anyway, I could propose another option (more similar to the functional syntax):
class T(TypedDict, partial=True):
x: int
I'm not sure I like "partial" (possible confusion with functools.partial)
but I like the class decorator. It even lets users specify a series of
required and a series of optional fields, using subclassing (only one would
use the class decorator).
What about 'incomplete' instead of 'partial'? Or total=False or complete=False instead of partial=True? Another option is 'checked', but it doesn't sound quite right in this context.
I kind of like the idea of using the class T(..., keyword=value) syntax -- I had forgotten about it.
I like total=False, using the class keyword.
Ok I'll go with total=False as a class keyword for now. This is something we can still iterate on if we ultimately aren't happy with it.
This was implemented in #3558.
Most helpful comment
There are many reasonable use cases where some typed dict keys are only defined some of the time:
TypedDictas an argument type, later API versions may support additional keys that are optional to retain backward compatibility.A simple approach would be have a way to annotate potentially missing keys. By default, a key would still be required. We can't use
Optional[...]for this since it has another meaning. Some ideas below:This doesn't work:
If some key is optional, it must be accessed using
get()or guarded with anincheck:Maybe we also need to track setting optional keys in the conditional type binder: