Python 3.7 will come with a new feature called Data Classes (see PEP 557). It is available for Python 3.6 on PyPI for demonstration purposes.
It looks like this:
@dataclass
class Artist:
name: str
@dataclass
class Album:
title: str
release_date: datetime.date
artist: List[Artist]
You can immediately notice the resemblance with marshmallow:
class ArtistSchema(Schema):
name = fields.Str()
class AlbumSchema(Schema):
title = fields.Str()
release_date = fields.Date()
artist = fields.Nested(ArtistSchema())
I think for most cases there's enough information in the dataclass for marshmallow to figure out the schemas by itself. I admit that the schema might change while the model will stay the same but for most cases, it won't be a problem.
My idea is to leverage on the dataclass to build a Schema automatically for DRY purposes when it makes sense while still getting the features of marshmallow under the hood. What's your opinion on that? Do you have an idea of implementation so it can be both DRY and extensible?
I may be able to work on a PR for this but I want to go in the right direction.
You actually don't need @dataclass to use type annotations on classes. PEP 526 landed support for type annotations on class properties in Python 3.6.
I personally don't use type annotations for any of my projects yet, but I like the idea of having the option to use native Python syntax for schema declarations. Fields are types, so it would be nice to treat them as such. The main limitation I see is that the builtin types don't provide a way to express field configuration.
Can you provide an example of how you would like to see marshmallow interacting with a data class?
Here is what I was imaging:
class ArtistSchema(Schema):
name: fields.Str()
class AlbumSchema(Schema):
title: fields.Str()
release_date: fields.Date()
artist: fields.Nested(ArtistSchema())
This would actually eliminate the need for a nested Meta class, because type annotations are stored separately from actual attributes.
class Foo(Schema):
name: fields.Str()
only: fields.Int()
only = ['name']
Foo.__annotations__
# {'name': <fields.String(...)>, 'only': <fields.Integer(...)>}
Foo.only
# ['name']
/cc @sloria
@deckar01 https://github.com/justanr/marshmallow-annotations
I did that as kind of a joke a while ago, but if it's actually something useful and people want, I'd be open to either expanding it or PRing it into marshmallow if there's plans to drop py2
It probably isn't of much use to the core due to the 3.6+ requirement and it would be weird to allow both styles simultaneously. I might look into making marshmallow-annotations move class attributes to meta attributes.
My idea was more something like this:
@dataclass
class Artist:
name: str
@dataclass
class Album:
title: str
release_date: datetime.date
artist: List[Artist]
@schema_for(Artist)
class ArtistSchema:
pass
@schema_for(Album, required=True) # global options for auto-generated fields
class AlbumSchema:
title: fields.Str(data_key='album_title') # override
This is just a helper so backward compatibility isn't an issue, it's all opt-in. It would also take care of adding the @post_load method so that objects are loaded / dumped as classes rather than plain dicts.
Or something like this even more DRY that does both with more magic:
@schema_dataclass
class Artist:
name: str
@schema_dataclass(required=True) # global options for auto-generated fields
class Album:
title: str
release_date: datetime.date
artist: List[Artist]
class Schema: # override
title: fields.Str(data_key='album_title')
This time the dataclass would act as a regular class but enhanced with load / dump methods with marshmallow under the hood.
I like this approach less as it is trying to mix both the class and the schema in a single object.
That looks similar to marshmallow-sqlalchemy. https://github.com/marshmallow-code/marshmallow-sqlalchemy
Instead of Meta.model you could use a dataclass property:
from marshmallow_annotations import Schema
@dataclass
class Artist:
name: str
@dataclass
class Album:
title: str
release_date: datetime.date
artist: List[Artist]
class ArtistSchema(Schema):
dataclass = Artist
class AlbumSchema:
title: fields.Str(data_key='album_title') # override
dataclass = Album
required = True # global options for auto-generated fields
I would recommend opening an issue on https://github.com/justanr/marshmallow-annotations to continue this conversation unless anyone feels strongly about this being part of the core.
I think the point you made about the syntax being 3.6 only is a good reason not to mainline it into marshmallow. Doubly so for dataclasses being 3.7 only (and that's not even been properly released yet).
Agreed; this doesn't belong in marshmallow core. Closing this for now.
I published a library that does exactly that: generating schemas from dataclasses.
from marshmallow_dataclass import dataclass # Importing from marshmallow_dataclass instead of dataclasses
from datetime import datetime
@dataclass
class Artist:
name: str
@dataclass
class Album:
title: str
release_date: datetime
artist: List[Artist]
Album.Schema # This is a valid marshmallow Schema class that you can use
Consuming objects from the typing module has been a pretty unpleasant experience for me. Some of the most fundamental operations necessary for working with generics have no public interface. The private properties I found in the code to hack something together have consistently changed in breaking ways between minor python releases.
Until they drop the provisional classification, the typing module imposes a maximum python version constraint on any library that depends on its API.
@deckar01 : I haven't experienced issues with that yet. The code is tested and works on all minor versions of python 3.7 and 3.8.
3.6 -> 3.7 releases had breaking changes. 3.8 is still in alpha and could break compatibility at any time before the final release several months from now. As it is now, the typing module will still be provisional in 3.8.
https://www.python.org/dev/peps/pep-0569/#schedule
https://docs.python.org/3.8/library/typing.html
__origin__ and __args__ are undocumented, yet are the only way to inspect generics.
https://docs.python.org/3.7/library/typing.html
The issue for stabilizing this API seems to have stalled waiting for 3rd party packages to become PEP candidates. typing_inspect is experimental and its primary purpose isn't really to maintain cross-version compatibility, but it is still probably a safer option than using the private interface of the typing module.
https://bugs.python.org/issue29262
https://github.com/ilevkivskyi/typing_inspect
I'm not suggesting that the typing module shouldn't be used, but supporting it in a library will come with more maintenance overhead than normal. Without a mechanism to enforce which python versions can be used, it's probably a good idea to document any library that depends on generic typing as experimental.
Thank you for the pointer to typing_inspect, I am going to use it.
Python 3.8 is adding public methods that normalize args and origin access.
https://docs.python.org/3.8/library/typing.html#typing.get_origin
@deckar01 Great ! Is there a backport to 3.7 and 3.6 ?
Most helpful comment
I published a library that does exactly that: generating schemas from dataclasses.
marshmallow-dataclass