Marshmallow: Why is post_load called after schema.validate()?

Created on 8 Apr 2019  路  11Comments  路  Source: marshmallow-code/marshmallow

Hello,

I noticed that @post_load is called when calling schema_object.validate().

Then I went to the library code and saw this (everything seemed OK, because I was getting a feeling postprocessing will not take a place):

image

It was obvious I will need some debugging, so I went through the calling stacktrace and got surprised when I saw the marshalling module calls the deserialize method which calls schema_object.load(). I don't think it is expected to have post_load called when you call schema_object.validate(), because in my case, we are performing some intensive operations, e.g. b64 decoding some string data and it takes some time. In this way, we have post_load called twice - the first time when a controller validates the input and the other when we actually load the data into a dict.

Of course, this is not a bug, but an open question.

bug

All 11 comments

What marshmallow version are you using?

Can you please provide a test case to reproduce the issue (e.g. adding a post_load method with assert False and calling validate)?

I don't think it is expected to have post_load called when you call schema_object.validate()

from marshmallow import Schema, fields, post_load

class Test(Schema):
    foo = fields.Str()

    @post_load
    def test_post_load(self, data):
        assert False

Test().validate({'foo': 'test'})
# {}

It doesn't seem to be called in marshmallow 2 or 3.

Of course, this is not a bug, but an open question.

If post_load was being called during validate it would be a bug. I can't reproduce that behavior.

Closing since the issue can't be reproduced. Will reopen if someone discovers this again.

@sloria I just hit this issue myself. The following fields.Dict with a nested schema should cause it in version 3.0.5.

from marshmallow import Schema, fields, post_load

class Nest(Schema):
    bar = fields.Str()

    @post_load
    def test_nest_post_load(self, data, **kwargs):
        assert False, 'Nest schema loaded'

class Test(Schema):
    foo = fields.Str()
    dict_with_nested = fields.Dict(keys=fields.String(), values=fields.Nested(Nest))

    @post_load
    def test_post_load(self, data, **kwargs):
        assert False, 'Test schema loaded'

Test().validate({
    'foo': 'test',
    'dict_with_nested': {
        'a': {
            'bar': 'test'
        }
    }
})

AssertionError: Nest schema loaded

I took a quick look to see how difficult this would be to fix, but it feels a bit hacky. I have a branch off dev here.

diff --git a/src/marshmallow/fields.py b/src/marshmallow/fields.py
index 003c33a..f89275e 100644
--- a/src/marshmallow/fields.py
+++ b/src/marshmallow/fields.py
@@ -567,11 +567,11 @@ class Nested(Field):
         if many and not utils.is_collection(value):
             raise self.make_error("type", input=value, type=value.__class__.__name__)

-    def _load(self, value, data, partial=None, many=False):
+    def _load(self, value, data, partial=None, many=False, postprocess=True):
         many = self.schema.many or self.many or many
         try:
-            valid_data = self.schema.load(
-                value, unknown=self.unknown, partial=partial, many=many
+            valid_data = self.schema._do_load(
+                value, unknown=self.unknown, partial=partial, many=many, postprocess=postprocess
             )
         except ValidationError as error:
             raise ValidationError(
@@ -579,7 +579,7 @@ class Nested(Field):
             ) from error
         return valid_data

-    def _deserialize(self, value, attr, data, partial=None, many=False, **kwargs):
+    def _deserialize(self, value, attr, data, partial=None, many=False, postprocess=True, **kwargs):
         """Same as :meth:`Field._deserialize` with additional ``partial`` argument.

         :param bool|tuple partial: For nested schemas, the ``partial``
@@ -589,7 +589,7 @@ class Nested(Field):
             Add ``partial`` parameter.
         """
         self._test_collection(value, many=many)
-        return self._load(value, data, partial=partial, many=many)
+        return self._load(value, data, partial=partial, many=many, postprocess=postprocess)


 class Pluck(Nested):
diff --git a/src/marshmallow/schema.py b/src/marshmallow/schema.py
index bee0653..adfc74f 100644
--- a/src/marshmallow/schema.py
+++ b/src/marshmallow/schema.py
@@ -591,7 +591,8 @@ class Schema(base.SchemaABC, metaclass=SchemaMeta):
         many: bool = False,
         partial=False,
         unknown=RAISE,
-        index=None
+        index=None,
+        postprocess=True
     ) -> typing.Union[_T, typing.List[_T]]:
         """Deserialize ``data``.

@@ -625,6 +626,7 @@ class Schema(base.SchemaABC, metaclass=SchemaMeta):
                             partial=partial,
                             unknown=unknown,
                             index=idx,
+                            postprocess=postprocess
                         ),
                     )
                     for idx, d in enumerate(data)
@@ -659,7 +661,7 @@ class Schema(base.SchemaABC, metaclass=SchemaMeta):
                 else:
                     d_kwargs["partial"] = partial
                 getter = lambda val: field_obj.deserialize(
-                    val, field_name, data, **d_kwargs
+                    val, field_name, data, postprocess=postprocess, **d_kwargs
                 )
                 value = self._call_and_store(
                     getter_func=getter,
@@ -853,6 +855,7 @@ class Schema(base.SchemaABC, metaclass=SchemaMeta):
                 many=many,
                 partial=partial,
                 unknown=unknown,
+                postprocess=postprocess,
             )
             # Run field-level validation
             self._invoke_field_validators(

Thanks @jordanhamill for posting a repro case and looking into this. I don't have time to look into this right now, but would review a PR if you're working on it.

Yup, I had nested schemas in my case, just didn't have time to write a test and later forgot about this. Thanks @jordanhamill

I was able to repro on 3.2.0. Here is a minimal repro case:

from marshmallow import Schema, fields, post_load

class Foo(Schema):
    bar = fields.Str()

    @post_load
    def test_post_load(self, data, **kwargs):
        assert False

class Test(Schema):
    foo = fields.Nested(Foo)

Test().validate({'foo': {'bar': 'test'}})
# AssertionError

This is an artifact of validation calling _do_load under the hood. Propagating postprocess through the field load calls would work, but this might be a good opportunity to consider separating validation from loading. There has been recurring interest in validating objects that have already been deserialized.

@deckar01 would a less invasive bug fix (like the above patch) with a validation refactor after suit? I don't have the background knowledge to attempt that refactor.

The validation refactor is quite an important task.

I think the above patch is an acceptable short-term solution. (@jordanhamill you forgot however to fix Pluck._deserialize).

Should it be considered a breaking change? It could break code that overrides _do_load in Nested or one of its child classes. Is this public API?

If the change is deemed breaking, then we could hold it and try to tackle the refactor first. Otherwise, we could go ahead with the fix.

(I assume the refactor would be a breaking change and can't happen before MA 4.)

I took a quick dive into the validate refactor. The way that fields are self validating during deserialization and skipping validation under certain conditions makes this more complicated that I expected. It should be possible to do this in a backwards compatible way, but I wouldn't block a bug fix on it.

I also tried to tackle the refactor this afternoon. Indeed, there is validation everywhere. ValidationErrors can be trigger anytime in deserialization.

And validation can be nested in sub-schemas, so we'll have to go through the whole nesting twice. Once at deserialization and once at validation.

The first issue (validation during deserialization) is not really a problem if the point is to be able to validate deserialized data, as errors happening on deserialization should only be related to the deserialization itself (wrong type, data can't be cast to proper type).

Was this page helpful?
0 / 5 - 0 ratings