Mypy: Typed dicts with missing keys

Created on 3 Jan 2017  路  24Comments  路  Source: python/mypy

We don't have a complete story for typed dicts when some keys may be missing. Support for get (#2612) is necessary, but it may not be sufficient. For example, consider code like this:

A = TypedDict('A', {'x': int, 'y': str})
a: A = {'x': 1}

Should it be possible to make this type check without a cast?

priority-0-high topic-typed-dict

Most helpful comment

There are many reasonable use cases where some typed dict keys are only defined some of the time:

  • If we have a JSON config file, it's reasonable to have some optional keys there, and it's also reasonable that later versions of a program add additional keys that may be missing if an earlier program version created the config file.
  • Similarly, if JSON is used in a network protocol, some keys may be optional to save some bandwidth. Also, older versions of a client may be missing some keys introduced in later clients.
  • If JSON is used for serialization, different versions of the serialized object may have different attributes, and we may want to omit some keys with default values to save space.
  • If we use TypedDict as an argument type, later API versions may support additional keys that are optional to retain backward compatibility.

A simple approach would be have a way to annotate potentially missing keys. By default, a key would still be required. We can't use Optional[...] for this since it has another meaning. Some ideas below:

A = TypedDict('A', {'x': OptionalKey[int]})
A = TypedDict('A', {'x': MayExist[int]})
A = TypedDict('A', {'x': NotRequired[int]})

class A(TypedDict):
    a: OptionalKey[int]
    b: MayExist[int]

This doesn't work:

A = TypedDict('A', {'x?': int})  # doesn't work with the alternative syntax

If some key is optional, it must be accessed using get() or guarded with an in check:

if 'x' in d:
    d['x']   # okay if 'x' is optional
d.get('x')  # okay if 'x' is optional
d['x']  # failure if 'x' is optional

Maybe we also need to track setting optional keys in the conditional type binder:

if 'x' not in d:
    d['x'] = 1
d['x']  # okay even if 'x' is optional, since we set it above

All 24 comments

There are many reasonable use cases where some typed dict keys are only defined some of the time:

  • If we have a JSON config file, it's reasonable to have some optional keys there, and it's also reasonable that later versions of a program add additional keys that may be missing if an earlier program version created the config file.
  • Similarly, if JSON is used in a network protocol, some keys may be optional to save some bandwidth. Also, older versions of a client may be missing some keys introduced in later clients.
  • If JSON is used for serialization, different versions of the serialized object may have different attributes, and we may want to omit some keys with default values to save space.
  • If we use TypedDict as an argument type, later API versions may support additional keys that are optional to retain backward compatibility.

A simple approach would be have a way to annotate potentially missing keys. By default, a key would still be required. We can't use Optional[...] for this since it has another meaning. Some ideas below:

A = TypedDict('A', {'x': OptionalKey[int]})
A = TypedDict('A', {'x': MayExist[int]})
A = TypedDict('A', {'x': NotRequired[int]})

class A(TypedDict):
    a: OptionalKey[int]
    b: MayExist[int]

This doesn't work:

A = TypedDict('A', {'x?': int})  # doesn't work with the alternative syntax

If some key is optional, it must be accessed using get() or guarded with an in check:

if 'x' in d:
    d['x']   # okay if 'x' is optional
d.get('x')  # okay if 'x' is optional
d['x']  # failure if 'x' is optional

Maybe we also need to track setting optional keys in the conditional type binder:

if 'x' not in d:
    d['x'] = 1
d['x']  # okay even if 'x' is optional, since we set it above

When receiving JSON objects from an untrusted source, we'd want to check that it conforms to our schema before using it as a typed dict. Otherwise bad things could happen. If optional key constraints are available for introspection, we could implement a helper that checks whether a runtime dictionary object is compatible with a typed dict type. This wouldn't be part of mypy or typing, but it could be a useful separate module.

FWIW hacklang conflates keys with Nullable with Optional Keys - https://docs.hhvm.com/hack/shapes/introduction#accessing-fields__nullable-fields-are-optional

I wouldn't worry too much about schema validation in Mypy. Marshmallow is already great at that (And the hypothetical plugin system would allow them to play nice together 馃槃)

@rowillia An approach similar to Hack sounds pretty reasonable for mypy as well. Here's how it could work:

  • d['x'] works no matter what type 'x' has.
  • d.get('x') is only allowed if 'x' has an optional type.
  • Keys with an optional type can be left out when constructing a typed dict instance.

This would clearly be unsafe, though. d['x'] could fail at runtime, in case 'x' is optional. Also, if some external library or service actually expects an explicit None value, it's easy to accidentally omit a required key and cause a runtime error.

Maybe mypy should be able to help catch these issues. Some ideas:

  1. Have a flag for each typed dict specifying how optional types work. There could be three options (or maybe just two of these): 1) Optional keys can be left out, but do not enforce get (perhaps the default). 2) Optional keys can be left out, but enforce using get with optional keys. 3) Keys with optional types cannot be left out.
  2. Have a config file and command line option for enabling strict get checks for all typed dicts. When this option is enabled, get must be used with optional keys. This is a pretty coarse-grained tool, but it might still work reasonably well.
  3. Let users mark fields with optional types as required. The syntax could be {'x': Required[Optional[int]]}. Such a key must always be present, and using get is not allowed. By default we could fall back to the Hack-like approach. Alternative spelling: ValueRequired[Optional[...]].
  4. Like (3), but enforce using get with optional keys (no d['x']).

I'm leaning towards (4), since it seems to provide sufficient flexibility with reasonable defaults, and it would also be safe. Also, I'd probably allow get to be used with arbitrary keys to make checking legacy code easier -- after all, the get call won't fail at runtime even if a key is required.

@davidfstr @gvanrossum What do you think?

A few quick thoughts:

  • Agreed that enabling external format checkers to check JSON blobs against TypedDict definitions would be useful.
  • Distinguishing between Optional items (that can be None) and items that can be omitted altogether feels valuable to me. I don't like the idea of conflating by default.

I don't have a lot of first-hand experience with JSON blobs that have omittable keys, so I don't have especially strong opinions on the semantics for working with them.

Okay, here's another proposal:

  • By default, keys must exist. Optional[...] as a value type is not special.
  • get can be used with arbitrary keys, even those that are declared to always exist. This is for convenience and it doesn't affect safety. The return type for the single-argument version of get is always an optional type, even if the key is supposed to always exist.
  • Use OptionalItem[t] for keys that may be missing. This is unrelated to Optional[...].
  • It's possible to use OptionalItem[Optional[t]] for an optional item that may be None.
  • OptionalItem[...] is only valid if it wraps a typed dict value type. It's an error to use it in any other context.

I chose the name OptionalItem as the name for a few reasons:

  • It seems better to reuse the term 'optional' instead of inventing a synonym. This is just a different flavor of optional.
  • This declares an optional item -- both the key and the corresponding value may be missing.

More rationale:

  • Being an optional item is conceptually orthogonal to accepting a None value, even though the dict.get method somewhat conflates these through the default None value for missing keys.
  • Having two independent concepts gives the most flexibility.
  • Having two similar concepts is going to be a little confusing. However, OptionalItem is a very searchable name and we can make sure all official documentation about it mentions how it's different from Optional.

Additional notes:

  • The join of typed dicts {'x': int} and {'x': int, 'y': str} probably should be {'x': int, 'y': OptionalItem[str]} for convenience. This is not safe, but I doubt that it matters in practice. I can create a separate issue for this with some rationale if we decide to move forward with the rest of this proposal.

Hm, that's pretty verbose, and many people probably don't even know whether their code cares about explicit None values vs. missing keys. How about intentionally conflating the two? The rules would be:

  • Items marked up without Optional are mandatory; d[k] is okay and has the stated type; d.get(k) is okay and has an Optional type
  • Items marked up with Optional may be None or missing; d[k] is not okay for these; d.get(k) must be used and has anOptional` type
  • If you really want a mandatory key with an optional type you can write it as Union[None, T]
  • For two-argument d.get(k, default) the return type is a union of None, the type of default, and the nominal type of d[k]

The one use case that this doesn't cover is when you want to allow the key to be missing, but when present you don't want the value to be None. Is that an important use case? If it is, I will withdraw this and then I am okay with the OptionalItem proposal.

My gut feeling is that a missing key with the value never being None is not rare. For example, some code I looked at today had code like d.get('x', {}).get('y', 1) which would fail if the return type of get would always include None.

Schema evolution or versioning is a typical case where it seems natural to have missing keys with non-Optional values. If we add a new item to a JSON dictionary, but we need to be prepared to accept older objects with a missing key (see above for a discussion about potential use cases), the most likely result seems to be a missing key with a non-optional type.

OK, I withdraw my proposal. IIRC @rowillia also posted some example code that used d.get(k1, {}).get(k2) so apparently this is a common assumption that we should be able to express. And for that case we would just write OptionalItem[...] so that's quite fine. +1!

I like the OptionalItem proposal, mainly on the rationale that explicit is better than implicit.

@gvanrossum I think that this would be nice to have before we make TypedDict official. I looked at Dropbox internal codebases and calls like d.get('foo', ...) were very common (over 10k hits), and I'd expect that they often imply an optional TypedDict item.

@JukkaL
Sorry for an off-topic question, but there is a very similar question for protocols, we decided to omit this for now, but I am just curious how frequent is something like if hasattr(...) in Dropbox internal codebases?

@ilevkivskyi We have around 1k instances of if ... hasattr, but some of them probably are unrelated to protocols.

@JukkaL
Thanks! This is already much less than for TypedDicts, so it looks like the decision to postpone this for protocols is justified.

We had an offline discussion about the syntax and @ddfisher wasn't happy with Optionaltem[...] because it's so close to Optional[...]. I came up with another alternative, Checked[...]. The idea behind the name is that the user must check for the existence of the item before accessing (or do it indirectly through get). It would look like this:

A = TypedDict('A', {'x': Checked[int]})

class A(TypedDict):
    a: Checked[int]

Pros:

  • Arguably communicates that you have to check for the existence of the item instead of just using it.
  • Less likely to be confused with Optional[...]. It's easy to talk about checked vs. optional dictionary items as separate things in docs, error messages and such.
  • As an adjective it's consistent with Optional[...] (unlike OptionalItem[...] which is a noun phrase).

Cons:

  • Can potentially be confused with the dictionary object doing the checking, instead of the user of the dictionary. However, if the user understands that typed dicts are just normal dictionaries at runtime, this is probably not too likely.
  • Checked is a less commonly used term than Optional (though there are precedents, such as checked exceptions in Java). However, the use case is also less common/typical than union-with-None.

Hm, Checked[int] looks weird to me. I wonder if we could just have a flag
that makes all of a given TypedDict's items optional? The app should be
allowed to write td[key] if they're certain that the key exists, or
td.get(key [, default]) if they're not.

Having a per-TypedDict flag might be a reasonable compromise. (Note that in my current implementation td.get('key'[, default]) is always valid, but there's no way to require it to be always used.)

Here are possible semantics in more detail:

  • If the flag is false (this is the default), both td[key] and td.get(key[, default]) are always accepted.

  • If the flag is true, then td['key'] would always be rejected without an explicit 'key' in td check. td.get(key[, default]) is always valid.

Possible syntax:

T = TypedDict('T', {'x': int}, partial=True)

def f(t: T) -> None:
    t['x']  # Invalid
    t.get('x')  # Ok
    assert 'x' in t
    t['x']  # Ok

f({})  # Ok
f({'x': 2})  # Ok

Other ideas for the name of the flag:

  • allow_partial
  • allow_missing
  • allow_missing_keys
  • allow_missing_items
  • missing_keys
  • missing_items

Not sure what's the best way to support the class-based syntax. We could perhaps have __partial__ = True in the class body. Alternatively, we could use a different base class such as PartialTypedDict, or another base class such as Partial. Finally, we could use a class decorator such as @partial. My current favorite is @partial:

@partial
class T(TypedDict):
    x: int

Since the class based syntax works only in Python 3.6 anyway, I could propose another option (more similar to the functional syntax):

class T(TypedDict, partial=True):
    x: int

I'm not sure I like "partial" (possible confusion with functools.partial)
but I like the class decorator. It even lets users specify a series of
required and a series of optional fields, using subclassing (only one would
use the class decorator).

What about 'incomplete' instead of 'partial'? Or total=False or complete=False instead of partial=True? Another option is 'checked', but it doesn't sound quite right in this context.

I kind of like the idea of using the class T(..., keyword=value) syntax -- I had forgotten about it.

I like total=False, using the class keyword.

Ok I'll go with total=False as a class keyword for now. This is something we can still iterate on if we ultimately aren't happy with it.

This was implemented in #3558.

Was this page helpful?
0 / 5 - 0 ratings