Fastapi: [QUESTION] SQL related tables and corresponding nested pydantic models in async

Created on 18 Jan 2020  路  19Comments  路  Source: tiangolo/fastapi

Really impressed with FastAPI so far... I have search docs github, tickets and googled the issue described below.

Description

How best to work with related tables and corresponding nested pydantic models whilst persisting data in a relational database in an async application?

Additional context

I have been attempting to extend the example in the docs
https://fastapi.tiangolo.com/advanced/async-sql-databases/
which relies on https://github.com/encode/databases

Using three test pydantic models as an example:

class UserModel(BaseModel):
    id: int
    title: str = Field(..., min_length=2, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(..., min_length=3, max_length=50)
    favourite_book: int = Field(...)

class FavouriteBook(BaseModel):
    id: int
    title: str = Field(...)
    author: str = Field(...)


class ExtendedUser(BaseModel):
    id: int
    title: str = Field(..., min_length=2, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(..., min_length=3, max_length=50)
    favourite_book: FavouriteBook

the route would ideally be along the lines of...

@router.get("/extended", response_model=List[ExtendedUser])
async def list():
    query =  **sqlAlchemy/databases call that works**
    return database.fetch_all(query=query)

How can a user create a route that returns the nested ExtendedUser from the database without resorting to performing two queries?
An SQL join is a standard way to do this with a single query. However, this does not work with SQLAlchemy core as the two tables contain 'id' and 'title' columns.
It is possible to work with SQLAlchemy orm - but not in an async way as far as I know. (async is my reason for using FastAPI ). I could rename the columns to something unique ( but to rename 'id' column seems like poor database design to me).

answered question

Most helpful comment

Yep, I know. We still don't have a solution for that. I have been thinking about creating a small layer on top of SQLAlchemy mixing SQLAlchemy and Pydantic... but we'll see :sweat_smile:

All 19 comments

there are multiple options.

  1. use a working sqlalchemy core query with databases.

It would be helpful to see what your tables look like, I don't think having similar columns names in different tables prevents you from joining them, you typically have to use something like

.join(table1.c.id == table2.c.id)

  1. use a raw sql query with databases, or without using directly asyncpg

  2. use aiosql (I'm currently switching to this, you get versioning of your sql, slight overhead in perf but nothing like an orm, no sql to core / orm translations headaches like you currently seem to have, direct output of the queries results into pydantic objects, etc...),

So you'd define your an extended_user.sql file like this

-- name: get-extended-user
-- record_class: ExtendedUser
-- Get extended user
select t.id, h.id
from yourtable
         join table1 t on yourtable.id_fk1 = t.id
         join table2 h on yourtable.id_fk2 = h.id
where yourtable.id = :id

then inside your endpoint you can do

queries = aiosql.from_path("extended_user.sql", "asyncpg",  record_classes={"ExtendedUser": ExtendedUser)
conn = await asyncpg.connect(
        "youdsn")
extendeduser: ExtendedUser = await queries.get_extended_user(conn, id='bfd1fb61-b168-4324-9d4a-814acd9f8815')
await conn.close()

thanks for your input @euri10 .

focussing on your option 1.

Maybe I misstated in the question but performing the join is not the problem. The issue appears when converting the result of the joined query into the pydantic models (or nested dictionary datastructure that mimicks my nested pydantic models) when the result of the join contians duplicate named columns. In this case the duplcated columns are 'id' and 'title'.

my SQLAchemy tables are defined like this :

users = Table(
    "users",
     metadata,
    [Column("id", Integer, primary_key=True),
     Column("title", String(50)),
     Column("firstname", String(50)),
     Column("lastname", String(50)),
     Column("username", String(50)),
     Column("email", String(50)),
     Column("favourite_book", Integer, ForeignKey("books.id"), nullable=True),
     Column("created_date", DateTime, default=func.now(), nullable=False)]
)

books = Table(
    "books",
    metadata,
    [Column("id", Integer, primary_key=True),
     Column("title", String(50)),
     Column("author", String(50)),
     Column("created_date", DateTime, default=func.now(), nullable=False)]
)

@euri10 btw haven't heard of aiosql -- will take a look at it...(direct output of the query results into pydantic objects sounds good to me.)

ok sorry I grasp that your issue was a nested issue generation.

In case of aiosql, I'm not sure that's possible now because the marshaling is made with something like:

            results = await connection.fetch(sql, *parameters)
            if record_class is not None:
                results = [record_class(**dict(rec)) for rec in results]

where results is an asyncpg.Record

in databases case, the issue is basically the same, and I'm not sure there is a simple way currently to achieve a nested marshaling out of the box since the result you get from the db query is flat by design.

maybe others will have ideas !

@leonh This isn't super well-documented, but pydantic does make it possible to change how the from_orm method works by using a custom value of Config.getter_dict (briefly mentioned here).

To understand how this works, you'll want to look at the source of pydantic.main.BaseModel.from_orm, pydantic.main.BaseModel._decompose_class, and pydantic.utils.GetterDict.

In particular, you'd want to subclass GetterDict (let's call it ExtendedUserGetterDict) and override ExtendedUserGetterDict.get so that, if key == "favourite_book", you return a dict containing the appropriate values for pydantic to parse into the FavouriteBook (otherwise you delegate to the super call). You'd then set ExtendedUser.Config.getter_dict = ExtendedUserGetterDict.

If you decide to take this approach, a docs PR to pydantic showing it worked out would be much appreciated!

Thanks for your input @dmontagu

I have managed to create a workaround for this issue by adding a table_source property on the Config class of my nested pydantic model. I haven't used a GetterDict property on the model config class yet.

The table_source property specifies the foreign keyed tables that need to be joined in order to fetch.

With much persistence and time to understand how SQLAlchemy returns results (I'm new to SQLAlchemy. It is possible to transform flat list response from the database into a nested data structure suitable for conversion into nested pydantic models (In a reasonably generic way).

My pydantic model now looks like this.


class ExtendedUser(BaseModel):
    id: int
    title: str = Field(..., min_length=2, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(..., min_length=3, max_length=50)
    favourite_book: FavouriteBook
    class Config:
        table_source = {
            '_default': lambda name='users': metadata.tables.get(name),
            '_f_keys': {
                'favourite_book': {
                    'table': lambda name='books': metadata.tables.get(name),
                    'model': FavouriteBook
                }
            }

Thanks for the help @euri10 and @dmontagu ! :tada: :taco:

@leonh also have in mind that if you are using SQLAlchemy you shouldn't use async def functions. That would reduce performance, as SQLAlchemy performs blocking operations. If you use SQLAlchemy, use standard def functions for your path operations.

If you really want to use async functions for some reason, then use an async alternative. For example Tortoise: https://tortoise-orm.readthedocs.io/en/latest/ or GINO: https://python-gino.org/

Thanks for the advice @tiangolo.

My code was using https://github.com/encode/databases which advertises itself as "Async database support for Python" I hadn't checked if it was indeed SQLAlchemy.core fully async wrapped or not? as it was Tom Christie I took the author's word for it!

However I have found SQLAlchemy.core rather difficult to work with an decided to go another route and have been having more luck over the last few days with Pony ORM: https://github.com/ponyorm/pony/ and like its "pythonic" syntax so far.

I have been looking for a very simple generic way to achieve async CRUD operations with the ability to do something like Django's select_related for foreign keys and many-to-manys within FastAPI.

btw. The documentation you guys have put together for FastAPI is really good. Awesome work.

@leonh Yeah, it's totally safe to use database tables declared using the ORM model API with sqlalchemy-core-created queries via encode/databases in async def endpoints.

Faced with a similar issue to what you've described, I wrote my own reusable/generic query-building logic for common CRUD operations, which was okay, but it would definitely be nice to have more of an async-friendly ORM.

Hopefully one day SQLAlchemy itself becomes a little more async friendly (unclear what that would look like though; it would likely require the use of a relatively restricted subset of current functionality).

@dmontagu > I wrote my own reusable/generic query-building logic for common CRUD operations

Is this up on github ?-- I'd be interested to see it!

ponyorm has worked well for me so far, my plan is to share the work once its a little more presentable and tested.

@leonh It's not currently on github, and given its state of maturity I'm not sure I'd recommend others make use of it, but I could probably put it up. I'll check and ping back here if I can push it.

About PonyORM, have in mind that you might incur into threading issues as it uses threading.local, check more here: https://github.com/ponyorm/pony/issues/494

You could end up having to monkeypatch it as described in https://fastapi.tiangolo.com/advanced/sql-databases-peewee/

Nevertheless, it could be fixed at the PonyORM side to use contextvars instead of thread locals and then everything would be fine. Or maybe there's a way to pass an explicit session/connection. I'm not really sure.


On the other side, Tortoise-ORM recently got first-class Pydantic support, mainly for these use cases :tada:

Check more in their docs: https://tortoise-orm.readthedocs.io/en/latest/contrib/pydantic.html

And more context in this issue: https://github.com/tiangolo/fastapi/issues/803

Thanks for the advice and continued work on FastAPI. I like the performance and syntax of PonyORM but it seems to fallen into a period of sporadic maintenance which is a cause for concern. My understanding was that as it was built in a way that would allow it to possibly be async friendly in the future even if it wasn't there right now.
I hope the use of contextvars for asyc'ing PonyOrm works.
I think the developers did a really good job on it when it was being built a few years ago, but maybe their priorities have moved in a different direction since then.

Planning to give Tortoise ORM a test run at some point, seeing as its been created to work well with FastAPI.

Side thought: The developer experience of so many similar but different typing systems in my code. Dataclasses, ORM models, Pydantic validation, OpenApi schema, JS Form libraries, or typescript, state management models for React. It gets quite repetitive from end to end if no type system is designated as the canonical form within a web stack.

Yep, I know. We still don't have a solution for that. I have been thinking about creating a small layer on top of SQLAlchemy mixing SQLAlchemy and Pydantic... but we'll see :sweat_smile:

Assuming the original issue was solved, it will be automatically closed now. But feel free to add more comments or create new issues.

Really impressed with FastAPI so far... I have search docs github, tickets and googled the issue described below.

Description

How best to work with related tables and corresponding nested pydantic models whilst persisting data in a relational database in an async application?

Additional context

I have been attempting to extend the example in the docs
https://fastapi.tiangolo.com/advanced/async-sql-databases/
which relies on https://github.com/encode/databases

Using three test pydantic models as an example:

class UserModel(BaseModel):
    id: int
    title: str = Field(..., min_length=2, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(..., min_length=3, max_length=50)
    favourite_book: int = Field(...)

class FavouriteBook(BaseModel):
    id: int
    title: str = Field(...)
    author: str = Field(...)


class ExtendedUser(BaseModel):
    id: int
    title: str = Field(..., min_length=2, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    firstname: str = Field(..., min_length=1, max_length=50)
    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(..., min_length=3, max_length=50)
    favourite_book: FavouriteBook

the route would ideally be along the lines of...

@router.get("/extended", response_model=List[ExtendedUser])
async def list():
    query =  **sqlAlchemy/databases call that works**
    return database.fetch_all(query=query)

How can a user create a route that returns the nested ExtendedUser from the database without resorting to performing two queries?
An SQL join is a standard way to do this with a single query. However, this does not work with SQLAlchemy core as the two tables contain 'id' and 'title' columns.
It is possible to work with SQLAlchemy orm - but not in an async way as far as I know. (async is my reason for using FastAPI ). I could rename the columns to something unique ( but to rename 'id' column seems like poor database design to me).

One nice workaround besides the one you used(if you are using sqlalchemy) would be to set use_labels to true when selecting, it would require slightly modifying your names in your pydantci model but ultimately it would save you a lot of headache when working with join queries

@leonh I have the same problem, could you explain with a little bit more detail about how you resolve it? :)

Regarding SQLAlchemy tables and async mapping to pydantic models. I supplied table config info onto the pydantic model (see table_source above). Then used that to generate a list of sqlalchemy.sql.expression.join these were added to select(selected_tables).select_from(child_joins) expression. When returning results rows to see which table the data had come from and used this to generate the nested pydantic models.

I didn't get past the proof-of-concept stage for this and eventually switched to using mongodb (which is a really great natural fit for fastapi and pydantic).

This was a few months ago .... SQLAlchemy now has an async feature in pre-release maybe investigate that as a starting point?

thank you for your answer :), I actually found a way to solve my problem just about 5 min ago.
I had a many to many relation in my database and I needed to return user and his/her roles in response,
I came across postgres array_agg aggregate function and now my pydantic model can parse the query result.
query:

SELECT username, last_login, ip_addresses, user_id,  password_exp_time, array_agg(r) as roles
FROM user_login
LEFT OUTER JOIN user_role ur on user_login.id = ur.user_login_id
LEFT OUTER JOIN role r on ur.role_id = r.id
GROUP BY username, last_login, ip_addresses, user_id, password_exp_time
Was this page helpful?
0 / 5 - 0 ratings