Hi,
there seems to be some discussion about what's the best way to use dataloader objects (see https://github.com/facebook/dataloader/issues/62#issue-193854091). The general question is whether dataloader objects should be used as application level caches or rather at request level.
My current implementation is based on https://docs.graphene-python.org/en/latest/execution/dataloader/ where dataloaders seem to be used as application level caches. The nice thing about this is that requests can benefit from what has already been cached by previous requests. However, I'm struggling with how to invalidate my dataloader in case the data in the repository changes. It occured to me that such issues could be prevented by moving the dataloader to the request level as suggested (sure, cached data would not be shared between requests anymore). Unfortunately, it is not clear to me how to do this based on the example in the documentation because the request itself is not explicitely represented.
Can someone provide a small example that uses graphene + flask-graphql
Cheers,
Sebastian
I'd love to see an example too!
Just an idea: you could put the Dataloaders on info.context (which is actually the current request).
Not sure what framework you're using, but with Flask I think the approach that we're going to take is to instantiate and attach our dataloaders to the g object on before_request and then delete it on teardown_request. Basically combining some notes from the DataLoader docs here https://github.com/syrusakbary/aiodataloader#creating-a-new-dataloader-per-request and the Flask docs here http://flask.pocoo.org/docs/1.0/patterns/deferredcallbacks/#deferred-request-callbacks.
Avery has the correct approach, the flask request context also jives with the Sanic example in aiodataloader. Closing, please comment if you feel this needs reopening.
I too am using Graphene + Flask-GraphQL + DataLoader (specifically AIODataLoader, with the AsyncioExecutor), and I am now trying to bind my DataLoaders to the request's lifecycle. My unfamiliarity with Flask/AsyncIO is leaving me with a piece of this puzzle still missing.
Could @averypmc, @sebastianthelen, or others provide a more complete example?
Piecing together the links here and in other Flask/Graphene/DataLoader docs, I have something like...
# project/loaders.py
from project.db import models
from aiodataloader import DataLoader
class UserLoader(DataLoader):
async def batch_load_fn(self, ids):
items = models.db_session.query(models.User).filter(models.User.id.in_(ids))
item_dict = {}
for x in items:
item_dict[x.id] = x
# Reorder items to match incoming id order
return [item_dict.get(id) for id in ids]
and
# project/api.py
from project.schema import schema
from project.loaders import UserLoader
from flask import Flask
from flask_graphql import GraphQLView
from graphql.execution.executors.asyncio import AsyncioExecutor
app = Flask(__name__)
@app.before_request
def construct_dataloaders():
g.dataloaders = {'user_loader': UserLoader()}
app.add_url_rule('/graphql', view_func=GraphQLView.as_view('graphql', schema=schema, graphiql=True, context={}, executor=AsyncioExecutor()))
@app.teardown_appcontext
def teardown_loaders():
g.pop('dataloaders', None)
if __name__ == '__main__':
app.run()
and
# project/schema.py
import graphene
class User(graphene.ObjectType):
id = graphene.ID(required=True)
email = graphene.String()
first_name = graphene.String()
last_name = graphene.String()
class Query(graphene.ObjectType):
user = graphene.Field(User, id=graphene.ID(required=True))
async def resolve_user(self, info, id):
return await <SOMETHING>['user_loader'].load(id))
schema = graphene.Schema(query=Query)
Previously I was allowing the DataLoader instances to be bound to the application lifecycle, and everything seemed to work fine.
When I attempt to run this and make a request, I get a RuntimeError: There is no current event loop in thread 'Thread-2'. at the point when it hits the line g.loaders = {'user_loader': UserLoader()} in the @app.before_request method in api.py. I'm guessing I need to set up an asyncio event loop for the flask process itself and pass that to the AsyncioExecutor via its loop param or something, but how exactly do I do that?
What goes in the <SOMETHING> in my resolver in schema.py? Is it g.dataloaders (assuming I import g from flask there)?
Thanks in advance for any help :smiley:
I created a new event loop and passed it to both the AsyncioExecutor and my DataLoader (the loop param to to aiodataloader's DataLoader constructor isn't in the readme anywhere, but it's set here).
# project/api.py
from project.schema import schema
from project.loaders import UserLoader
from flask import Flask
from flask_graphql import GraphQLView
from graphql.execution.executors.asyncio import AsyncioExecutor
import asyncio
app = Flask(__name__)
@app.before_request
def construct_dataloaders():
global loop
g.dataloaders = {'user_loader': UserLoader(loop=loop)}
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
app.add_url_rule('/graphql', view_func=GraphQLView.as_view('graphql', schema=schema, graphiql=True, context={}, executor=AsyncioExecutor(loop=loop)))
@app.teardown_appcontext
def teardown_loaders():
g.pop('dataloaders', None)
if __name__ == '__main__':
app.run()
This seems to be "working", but can anyone confirm if this is valid usage? From logging output, it appears my DataLoader's __init__ is indeed being called again with each new request. My concern is around sharing the same event loop between what appears to be different threads, which I don't think I'm supposed to be doing since aiodataloader uses call_soon() instead of call_soon_threadsafe().
Is it possible to get this issue re-opened?
Most helpful comment
Is it possible to get this issue re-opened?