Marshmallow: Propose add required to schema constructor

Created on 3 Jul 2019  路  30Comments  路  Source: marshmallow-code/marshmallow

For a put request model require of id field, but for post request id not needed. For disabling in a post request I use dump_only=["id"], but I have not found such a way to do id required. For this reason, I suggest adding required to the schema constructor.

question

Most helpful comment

In my APIs, I use the same schema but the ID for the PUT is in the URL, so it is injected in the view func (by webargs) as a path parameter.

All 30 comments

Your use case can be achieved by using a separate schema for POSTs vs PUTs.

In my APIs, I use the same schema but the ID for the PUT is in the URL, so it is injected in the view func (by webargs) as a path parameter.

Your use case can be achieved by using a separate schema for POSTs vs PUTs.

I use marshmallow sqlalchemy. It generates schema by models. How to observe the dry-principles? What is the problem to add?

class WidgetSchema(ModelSchema):
    class Meta:
        model = Widget

class CreateWidgetSchema(WidgetSchema):
    class Meta(WidgetSchema.Meta):
        exclude = ('id',)

Adding another parameter is not ideal. We try to keep marshmallow's API surface as small as is practical. An added parameter adds combinatorial complexity to the API because it needs to be tested against the existing parameters. There's additional consequences to consider, like whether the parameter needs to also be included in load/dump, Meta options, whether to propagate to nested schemas or not, etc. Then there's the difficulty in naming and documenting to meet user expectations (what does it mean for a schema to be "required"?). This is a maintenance burden we are not willing to take.

You have required in the Field class. It means that the Raise a ValidationError if the field value is not supplied during deserialization. I propose to add the ability for specifying it in the constructor of the schema. And, I think it need be added to load/dump and Meta too. If it is possible to specify at the field level, then why should one forget about the possibility of specification elsewhere? If something I am ready to do it.

"required" makes intuitive sense at the field level but not at the schema level. It'd be too easy to misundertand the semantics of MySchema(required=True). Furthermore, it's not clear on first glance what this should do:

class MySchema(Schema):
    foo = fields.Str(required=False)
    bar = fields.Str()

MySchema(required=True).load({})

Like I said, we're not willing to take the maintenance cost of adding this, especially when there is valid workaround using simple inheritance.

50+ schemes become 100+. It's not cool. required can be renamed to required_fields, it's not a problem. And how much is a big expense?

@heckad Some of the discussion at https://github.com/marshmallow-code/webargs/issues/112 may be interesting and / or useful.

50+ schemes become 100+. It's not cool.

This use case will always require double the amount of schema instances, so if it is an issue of too much class declaration boilerplate, it could probably be written as a on-liner utility.

def update_model(schema):
    class UpdateSchema(schema):
        ...
    return UpdateSchema

UpdateAuthorSchema = update_model(AuthorSchema)
UpdateBookSchema = update_model(BookSchema)

I suspect that, in practice, the models are never this uniform and there are lots of other model specific fields that need to be excluded/required based on the semantics of the operation, which requires explicitly declaring the alternate schemas. SQLAlchemy models don't capture this information, but maybe marshmallow-sqlalchemy could offer a way to declare it once and generate separate schemas for common CRUD operations.

There's also my marshmallow-annotations extension that will generate a schema for you based on annotations used in the target class and by default things are marked as required unless it's an optional field.

If you're in the habit of declaring DTOs for each operation _and_ you're using type annotations that's a potential option as well. Using multiple DTOs was actually the target problem solve for this extension rather than needing to handcraft multiple schemas and keep them in sync with changes to the underlying model.

And in what ways can you specify the required fields, without making new classes?

You've got three options:

  1. Use schema level validation

  2. Potentially unpopular, you're trying to represent two contracts with the same class, so make a second class

  3. Find or implement something like FluentValidation and use marshmallow strictly for model binding

I tend to favor them in that order, I've not had to find a secondary validation library though so being honest I'm not sure what's out there for it.

class WidgetSchema(ModelSchema):
    class Meta:
        model = Widget

class CreateWidgetSchema(WidgetSchema):
    class Meta(WidgetSchema.Meta):
        exclude = ('id',)

Adding another parameter is not ideal. We try to keep marshmallow's API surface as small as is practical. An added parameter adds combinatorial complexity to the API because it needs to be tested against the existing parameters. There's additional consequences to consider, like whether the parameter needs to also be included in load/dump, Meta options, whether to propagate to nested schemas or not, etc. Then there's the difficulty in naming and documenting to meet user expectations (what does it mean for a schema to be "required"?). This is a maintenance burden we are not willing to take.

This prohibits send the id to the method for creating objects, but I need to pass the id to the update method. Now I can send data to update without id and marshmallow-sqlalchemy will not raise ValidationError.

@sloria what other ways can this be done with marshmallow( marshmallow-sqlalchemy )

I think this is marshmallow-sqlalchemy specific.

You have a schema generated from a model. You want one schema that requires an id, and one schema that does not allow an id. You can force the main schema to require the id using the info argument on the SQLAlchemy column. See the docs on that topic. You can create a second schema instance that omits the id field using the exclude argument in the schema constructor.

# Setup

import marshmallow_sqlalchemy as ma
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker

engine = sa.create_engine("sqlite:///:memory:")
session = scoped_session(sessionmaker(bind=engine))
Base = declarative_base()

class Author(Base):
    __tablename__ = "authors"
    id = sa.Column(sa.Integer, primary_key=True, info=dict(marshmallow=dict(required=True)))
    name = sa.Column(sa.String)

Base.metadata.create_all(engine)
# Schema definition

class AuthorSchema(ma. ModelSchema):
    class Meta:
        model = Author
        sqla_session = session

update_author_schema = AuthorSchema()
create_author_schema = AuthorSchema(exclude=['id'])
# Usage

update_author_schema.load({'id': 1, 'name': 'test'})
# <__main__.Author object at 0x10819c6a0>
create_author_schema.load({'name': 'new'})
# <__main__.Author object at 0x10819c0f0>

update_author_schema.load({'name': 'test'})
# marshmallow.exceptions.ValidationError: {'id': ['Missing data for required field.']}
create_author_schema.load({'id': 1, 'name': 'new'})
# marshmallow.exceptions.ValidationError: {'id': ['Unknown field.']}

If I want to change sqlalchmy on Django models or maybe use any NoSql databases? What to do? Marshmallow allows you to specify only which fields should be used (only), which ones should not be used (exclude), which should be ignored (dump_only), but not which must be required, and which at will or not will(not required). Maybe this is the mise feature.

The proposed feature is logical, but its use case is probably not common enough to justify its added complexity. Convenience and preferred syntax are generally not strong justifications for functionality that is already possible. If you think there is some functionality that is currently impossible to achieve, a full code sample would help us understand it better.

For example session schema with filds: id(integer), table_id(integer), user_id(integer), comment(Text), mark(integer). Let go fields: table_id, user_id are required for post request, and id field should be excluded. In put request the only id required. What now I should do with pure marshmallow? I would like to create one scheme and in the constructor specify this constraint. For example

create_schema = schema(many=True, exclude=["id"], required_fields=["table_id", "user_id"])
update_schema = schema(many=True, required_fields=["id"])

This allows not to duplicate the schema code twice also makes a more explicit behaviour with auto mappers. (marshmallow-sqlalchemy and django-rest-marshmallow).

I would like to create one scheme and in the constructor specify this constraint.

Convenience and preferred syntax are generally not strong justifications for functionality that is already possible.

I just expressed a wish, generally waiting for your options. Also, I looked at the code and find in schema.py file on line 629 (id dev 666) check if raw_value is missing:. Need to add in this if the check on if field_name in required_filds: raise ValidationError. It's all.
Full code example:

  if raw_value is missing:
      if field_name in required_filds: 
          field_obj.fail("required")
      # Ignore missing field if we're allowed to.
      if (
           partial is True or
           (partial_is_collection and attr_name in partial)
       ):
           continue

Convenience and preferred syntax are generally not strong justifications for functionality that is already possible.

It is possible only through crutches which in the large code base can be a reason for bugs. Offer your flexible and convenient way to do it in the example with sessions.

https://github.com/marshmallow-code/marshmallow/issues/1280#issuecomment-509327019

Does this example work for your use case? It is only one line to make the second schema.

No, because fields "table_id", "user_id" become dependent for put request. It required only for post

I propose to rename to load_necessary

@deckar01 What to do? I explained why your decision does not correct.

In this use case, required needs to be True on one schema and False in another, so exclude doesn't work. The field needs to be overridden. I don't see a good way to override fields when they are generated by marshmallow-sqlalchemy. If these schema classes were manually defined it would be possible to write two schemas and deduplicate the common fields as needed.

This problem isn't specific to required. Most field attributes will have this limitation. Maybe marshmallow-sqlalchemy needs a better way to specify field attributes like required so that multiple schemas can be generated from the same model with different attributes.

I propose to add the check on the missing field in comment. This removes the need to modify fields. It's very simple and adds 7 lines of code without tests and maybe specific documentation.

@deckar01 @sloria what do you say about implementation?

@heckad Without wishing to deviate from the main discussion, what do you say about solving it in marshmallow-sqlalchemy as suggested by @deckar01 ? I and possibly others have similar issues (as I posted above).

It is a bad solution because:
1) It is only suitable for marshmallow-sqlalchemy. If I change binding from sqlalchemy to django-orm then you need to remember to rewrite the constraint.
2) It spreads the code responsible for validation across the code of the entire project.
3) It can not be applied if models cannot be changed(not your area of responsibility)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DenisKuplyakov picture DenisKuplyakov  路  4Comments

pd-Shah picture pd-Shah  路  4Comments

jayennis22 picture jayennis22  路  4Comments

nickretallack picture nickretallack  路  4Comments

j4k0bk picture j4k0bk  路  3Comments