The SQLAlchemy library comes with a scoped_session object, which effectively allows for thread-safe database sessions. Is there a reason the docs recommend using the dependency method > middleware method > just a scoped_session global?
I'm not sure about the reasoning behind the doc's reccomendations, but the reason why I'd personally advise against making scoped_session global is because it then forces you to also have a global engine to bind it to, and global variables in ASGI don't have a well-defined lifetime (see #617). There's also how using dependencies rather than globals mean that all the ressource needed by your route are passed as function parameters, which tends to make testing easier since you can just pass a fake DB session instead of the one you would normally use.
SQLAlchemy's own doc says scoped_session are just one of many ways of handling sessions. From my understanding of how scoped_session and sessions work, thread-local sessions might not work that well with FastAPI since Python's coroutines don't have a 1:1 mapping with threads, so I suspect that this could cause some unexpected behavior (though I haven't tried this, I might be wrong).
I personally prefer handling my connections this way:
from typing import Optional, AsyncIterable
from fastapi import Depends, FastAPI
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
from sqlalchemy.engine import Engine as Database
app = FastAPI()
_db_conn: Optional[Database]
@app.on_event("startup")
def open_database_connection_pools():
global _db_conn
_db_conn = create_engine(...)
@app.on_event("shutdown")
def close_database_connection_pools():
global _db_conn
if _db_conn: _db_conn.dispose()
async def get_db_conn() -> Database:
assert _db_conn is not None
return _db_conn
# This is the part that replaces sessionmaker
async def get_db_sess(db_conn = Depends(get_db_conn)) -> AsyncIterable[Session]:
sess = Session(bind=db_conn)
try:
yield sess
finally:
sess.close()
@app.get("/")
def foo(db_sess: Session = Depends(get_db_sess)):
pass
The benefits of this approach are:
uvicorn --reloadThanks for the in depth for response @sm-Fifteen! Those do seem like some good benefits over using scoped_session.
One clarification though: you initially mentioned how it's probably better not to have a global database connection, but then your solution is just passing around the connection as a global, no? You add a layer of indirection with the database connection dependency, but I'm not sure I see the benefit of that approach.
One clarification though: you initially mentioned how it's probably better not to have a global database connection, but then your solution is just passing around the connection as a global, no? You add a layer of indirection with the database connection dependency, but I'm not sure I see the benefit of that approach.
Yeah, I should probably have explained that one a bit more. The important distinction is that while _db_conn lives in the global scope (mainly because there is nowhere else it could live), it's not created in the global scope, nor is it ever accessed directly from the global scope aside from the 3 functions that control it.
To go back to the example I was giving:
_db_conn: Optional[Database]
def open_database_connection_pools():
global _db_conn
_db_conn = create_engine(...)
def close_database_connection_pools():
global _db_conn
if _db_conn: _db_conn.dispose()
async def get_db_conn() -> Database:
assert _db_conn is not None
return _db_conn
...these three functions should be the only places where _db_conn is ever used directly. If that seems like a convoluted way of doing the same thing as _db_conn = create_engine(...) in the global space and calling it a day, the key thing to understand here is that SQLAlchemy's automatic cleanup system works in a WSGI setting (such as when using Flask) because it it set to cleanup whenever the Python interpreter shuts down. Since ASGI is both event-based and async, though, doing it that way in FastAPI could cause you to run into a number of subtle issues (like the --reload problem I briefly mentionned where reloading your code would create a new connection engine without closing the first one, meaning any connection the old engine kept open could leak and stay open). This goes double for any sort of global dependency that isn't SQLAlchemy, because while SQLAlchemy's engines and connection wrappers do extremely smart things to ensure that they get cleaned up automatically without the user having to do anything, most objects will not.
This weird pattern with the startup and shutdown events combined with a global variable is an attempt at combining them into some kind of "context manager" to create dependencies with a setup and teardown phase (something like get_db_sess), but meant to last for the application's entire lifetime. Since that's not something FastAPI currently supports (see issue #617), for the time being, I need to use variables declared (not defined) in the global scope to store whatever is initialized by the startup event and finalized with the shutdown event.
I should specify that in SQLAlchemy's case specifically, this is probably not needed (unless you want to make use of --reload), but leaving dependencies open in the global scope had me run into cleanup issues before (especially with async DB connections, which can't asynchoneously close with the Python interpreter because by then the asyncio event loop is most likely gone), so I figure it's a good pattern to use in general.
How about this kind of pattern https://github.com/mfreeborn/fastapi-sqlalchemy similar to the way it works in flask?
How about this kind of pattern https://github.com/mfreeborn/fastapi-sqlalchemy similar to the way it works in flask?
It's fine, I guess, but it limits you to a single database connection (I need to connect to at least 3 on one of the projects I'm developping) and it doesn't take advantage of dependency injection so you can't really make your routes or functions use a different DB object without modifying their code or the initialization sequence of your middleware.
@acnebs: You might want to close this issue if your question has been answered.
Note that there is now also a contextmanager-style approach to getting a session that may be preferable to using a middleware if not all endpoints rely on the database. This is documented here.
Note that there is now also a contextmanager-style approach to getting a session that may be preferable to using a middleware if not all endpoints rely on the database. This is documented here.
I don't know if this is a good idea with database connection pools, the way I see it, it's probably preferable to open them once when starting the application and pulling connections from those as dependencies. There are valid use cases to context manager dependencies, but I'm not sure if this is one of those.
@sm-Fifteen
The contextmanager approach is almost exactly equivalent to the middleware approach, so I'm not sure what you mean with regard to database connection pools. In particular, the connection pool is still only opened once when starting the application (as long as DBSession is a sessionmaker tied to a specific engine). The only things that happens in the contextmanager are 1) creating the session using the long-lived sessionmaker, then 2) cleaning up the session after you've handled the request.
The only differences between the contextmanager approach and the middleware approach, as far as I'm aware, are that 1) the contextmanager approach doesn't clean up the session until after sending the response (so you need to handle commits and any associated errors inside your endpoint, which I would argue you should do anyway unless you want to be returning 500s to your clients), and 2) in the contextmanager approach, a session isn't automatically created for every request -- it is only created for endpoint calls that actually make use of the session dependency.
If your point was just that you don't like the middleware approach either, that's fine (although I haven't seen any argument against the use of a global sessionmaker; as far as I can tell that behaves basically equivalently to your proposed approach, but I could be wrong?), it's just your response made it seem like you took issue with something different than what you took issue with with the middleware approach.
The contextmanager approach is almost exactly equivalent to the middleware approach, so I'm not sure what you mean with regard to database connection pools. In particular, the connection pool is still only opened once when starting the application (as long as
DBSessionis asessionmakertied to a specificengine). The only things that happens in the contextmanager are 1) creating the session using the long-lived sessionmaker, then 2) cleaning up the session after you've handled the request.
Oh, I agree that sessions should be handled with context manager dependencies, I meant that more for the underlying database connection pool, which should be instanciated once for the application and then reused (I realize I'm probably getting annoying with this talk of lifetime-scoped dependencies, but I think it's really really important, you know?), which neither the middleware approach nor the ctxmgr dependencies would be well suited for.
@acnebs about your question, SQLAlchemy scoped_session is based on thread locals. As @sm-Fifteen points out, those are not mapped 1:1 to requests when using async stuff. In summary, a single thread can handle more than one request, and at the same time, a single request could be handled by more than one thread (so it's actually not 1:1 but many:many). The alternative would be ContextVars, those are made to solve these use cases. But they don't behave exactly the same as thread locals and there are some caveats.
For example, Peewee doesn't have a way to _not_ use global state for connections, so it requires several tricks with ContextVars (you can read more about it here in the docs: https://fastapi.tiangolo.com/advanced/sql-databases-peewee/).
@tiangolo @sm-Fifteen Some very good points. Just wanted to make sure the reasoning behind things were clear. This is very helpful for my own (and hopefully others) understanding!
@tiangolo could you perchance comment on @sm-Fifteen's discussion of not using a globally-instantiated engine? The docs _do_ globally instantiate engine = create_engine(..) and I'm wondering if it's (a) just fine that way unless --reload or other; (b) sm-Fifteen is right and it should look like his
Most helpful comment
I'm not sure about the reasoning behind the doc's reccomendations, but the reason why I'd personally advise against making
scoped_sessionglobal is because it then forces you to also have a globalengineto bind it to, and global variables in ASGI don't have a well-defined lifetime (see #617). There's also how using dependencies rather than globals mean that all the ressource needed by your route are passed as function parameters, which tends to make testing easier since you can just pass a fake DB session instead of the one you would normally use.SQLAlchemy's own doc says
scoped_sessionare just one of many ways of handling sessions. From my understanding of howscoped_sessionand sessions work, thread-local sessions might not work that well with FastAPI since Python's coroutines don't have a 1:1 mapping with threads, so I suspect that this could cause some unexpected behavior (though I haven't tried this, I might be wrong).I personally prefer handling my connections this way:
The benefits of this approach are:
uvicorn --reload