I think it would be great to have opportunity to use async functions like we use fields.MethodField and fields.Function.
What do you think about adding the two new fields: AsyncMethod and AsyncFunction?
To support these will also need to implement some async methods in Schema like async_dump, async_dumps, async_load, async_loads and etc. That these changes will also affect on Marshaller and Unmarshaller.
You can kind of hack around it using fields.Function and event_loop.run_until_complete():
from marshmallow import Schema, fields
import asyncio
# Source: https://nrempel.com/posts/making-pythons-asyncio-sync/
def run(coroutine):
event_loop = None
try:
event_loop = asyncio.get_event_loop()
except RuntimeError:
event_loop = asyncio.new_event_loop()
asyncio.set_event_loop(event_loop)
return event_loop.run_until_complete(coroutine)
class Test(Schema):
foo = fields.Function(serialize=lambda v, o: run(Test.bar(v, o)))
@staticmethod
async def bar(value, obj):
return 'baz'
Test().dump({'foo': 'test'})
Since asyncio is part of the python core, I could see the proposed fields being part of the marshmallow core.
If we used the strategy I showed above in the new field implementations it wouldn't need to affect Marshaller and Unmarshaller. Do you see any drawback to that strategy?
That won't work if you're already in the loop.
Personally, I question the value of supporting asyncio inside marshmallow since that implies IO is being done inside the serialize or deserialize calls which feels like a bad time waiting to happen.
I though asyncio.get_event_loop() used the existing event loop.
get_event_loop()
Get the event loop for the current context.Returns an event loop object implementing the AbstractEventLoop interface. In case called from coroutine, it returns the currently running event loop.
IO is being done inside the serialize or deserialize calls which feels like a bad time waiting to happen.
It don't think it would be bad to have a model that has an async method (like an SQL query) that only needs to be executed if the field is actually dumped. I think SQLAlchemy does stuff like this and hides it behind property accessors, although all the asyncio integrations I have seen for it are separate modules.
In this case we will have a RuntimeError:
from marshmallow import Schema, fields
import asyncio
def run(coroutine):
event_loop = None
try:
event_loop = asyncio.get_event_loop()
except RuntimeError:
event_loop = asyncio.new_event_loop()
asyncio.set_event_loop(event_loop)
return event_loop.run_until_complete(coroutine)
class Test(Schema):
foo = fields.Function(serialize=lambda v, o: run(Test.bar(v, o)))
@staticmethod
async def bar(value, obj):
return 'baz'
async def main():
# Some business logic and async call here
Test().dump({'foo': 'test'})
asyncio.get_event_loop().run_until_complete(main())
In my opinion to support async calls we also should implement async interface for Marshaller and Unmarshaller. I think it will affect such methods as serialize and call_and store.
I'm not sure how feasible it is to integrate asyncio into the core while maintaining support for python 2.7. Supporting async methods and encapsulating the asyncio logic in special fields would be a nice start.
hmmm, Please correct me if I'm wrong. Does Marshmallow do any IO operations during serialization/deserialization process? How asyncio can help? As far as I know all asyncio tasks are executed in the same thread, and using asyncio is useful in case of IO operations or subprocess calls, when one routine can be performed while another is waiting.
@SVilgelm the OP is about fields.Method and fields.Function that could use acync functions. It is a corner case.
I'm not sure how feasible it is to integrate asyncio into the core while maintaining support for python 2.7. Supporting async methods and encapsulating the asyncio logic in special fields would be a nice start.
I suggest to create aio.py and implement there AsyncMarshaller, AsyncUnmarshaller and etc.
Then we can use the following strategy:
import sys
PY35 = sys.version_info() >= (3, 5, 0)
__all__ = [
'Marshaller',
'Unmarshaller',
]
if PY35:
from marshmallow.aio import AsyncMarshaller, AsyncUnmarshaller
__all__.update([
'AsyncMarshaller',
'AsyncUnmarshaller'
])
Personally, I question the value of supporting asyncio inside marshmallow since that implies IO is being done inside the serialize or deserialize calls which feels like a bad time waiting to happen.
I am also trying to find ways to incorporate Marshmallow in my aiohttp project. There are cases when you require to do async calls during validation. For instance I need to know if another record exists in the database while validating the schema. Another case might be to validate an uploaded file to Amazon S3 by doing an HTTP call etc.
Is there any possibility of supporting async calls in the validation methods?
In my personal opinion, and this reflects how I build systems, I use tools like marshmallow to ensure my data is the correct shape. Business logic validations - does another record exist, is this on a banned words list, etc - go into separate services that act on a different level than marshmallow, aiohttp, flask, click, etc.
Thanks @justanr. I come from a Django Rest Framework background where validation on the database model is discouraged and validation is expected to be put on the serializer and not model. This is to support complex validation and nested serialization.
But yes, I guess validation can also be put outside of serialization process.
The main use case I see for supporting an async method field is a model that has async methods for computed/IO data that need to be serialized. It would be nice if we had a field that could block on async methods.
I agree that trying to detect collisions with DB constraints at serialization time is a red flag. You don't have to put the logic in your model, but I wouldn't recommend putting it in the middle of the deserialization logic. In my experience, the DB schema is the single source of truth for uniqueness constraints and the business logic for querying is prepared to handle the errors that the DB returns when the constraints are violated. All that usually happens after I am done using marshmallow to clean my data.
In my personal opinion, and this reflects how I build systems, I use tools like marshmallow to ensure my data is the correct shape. Business logic validations - does another record exist, is this on a banned words list, etc - go into separate services that act on a different level than marshmallow, aiohttp, flask, click, etc.
So do I. The database driver returns errors that the API framework catches and formats like validation errors, so this is transparent to the client.
Also, validating another entity's existence before actually writing in the DB means you need some DB lock around your resource to prevent race conditions. I prefer the try/catch approach.
Closing, as I don't think we will move forward with this. I agree with @justanr's suggestion above (leave IOps outside of marshmallow as a rule of thumb).
If anyone ends up implementing AsyncSchema/AsyncMethod/AsyncFunction functionality, it should live outside of marshmallow core (but feel free to link to it here =)).
You can use async_cached_property and AwaitLoader as long as you await the the object before it is serialized.
https://github.com/ryananguiano/async_property
import asyncio
from async_property import async_cached_property, AwaitLoader
from marshmallow import Schema, fields
class Foo(AwaitLoader):
@async_cached_property
async def value(self):
return await self.get_remote_value()
async def get_remote_value(self):
# Call network
return 'abc'
class FooSerializer(Schema):
value = fields.String()
async def main():
return FooSerializer().dump(await Foo())
asyncio.run(main())
Most helpful comment
In my personal opinion, and this reflects how I build systems, I use tools like marshmallow to ensure my data is the correct shape. Business logic validations - does another record exist, is this on a banned words list, etc - go into separate services that act on a different level than marshmallow, aiohttp, flask, click, etc.