Fastapi: [QUESTION] FastApi & MongoDB - the full guide

Created on 4 Jun 2020 · 12Comments · Source: tiangolo/fastapi

Description

In this issue i'd like to gather all the information about the use of MongoDB, FastApi and Pydantic. At this point this is a "rather complete" solution, but i'd like to gather feedback and comments from the community to se how it can be improved.

The biggest pain point that started this and several other threads when trying to use FastAPI with mongo is the _id field. There are several issues here:

Most known one - _id field being ObjectId, which is not very JSON-friendly
_id field by it's naming is not very python-friendly (that is, written as is in Pydantic model, it would become a private field - many IDEs will point that)

Below i'll try to describe solutions i've found in different places and see what cases do the cover and what's left unsolved.

Let's say, we have some Joe, who's a regular developer. Joe just discovered FastAPI and is familiar with mongo (to the extend that he can create and fetch documents from DB). Joe wants to build clean and fast api that would:

1️⃣ Be able to define mongo-compatible documents as regular Pydantic models (with all the proper validations in place):

class User(BaseModel):
    id: ObjectId = Field(description="User id")
    name: str = Field()

2️⃣ Write routes that would use native Pydantic models as usual:
```python
@app.post('/me', response_model=User)
def save_me(body: User):
...

3️⃣ Have api to return json like `{"id": "5ed8b7eaccda20c1d4e95bb0", "name": "Joe"}` (it's quite expected in the "outer world" to have `id` field for the document rather than `_id`. And it just looks nicer.)
4️⃣ Have Swagger and ReDoc documentation to display fields `id` (str), `name` (str)
5️⃣ Be able to save Pydantic documents into Mongo with proper `id` field substitution:
```python
user = User(id=ObjectId(), name='Joe')
inserted = db.user.insert_one(user) # This should insert document as `{"_id": user.id, "name": "Joe"}`
assert inserted.inserted_id == user.id

6️⃣ Should be able to fetch documents from Mongo with proper id matching:

user_id = ObjectId()
found = db.user.find({"_id": user_id})
user = User(**found)
assert user.id == user_id

Known solutions

Validating ObjectId

As proposed in #452, one can define custom field for ObjectId and apply validations to it. One can also create base model that would encode ObjectId into strings:


class OID(str):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v):
        try:
            return ObjectId(str(v))
        except InvalidId:
            raise ValueError("Not a valid ObjectId")


class MongoModel(BaseModel):
    class Config(BaseConfig):
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

class User(MongoModel):
    id: OID = Field()
    name: str = Field()


@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    return body

Now we have:
| 1️⃣ | 2️⃣ | 3️⃣ | 4️⃣ | 5️⃣ | 6️⃣ |
|----|----|----|----|----|----|
| ✅ | ✅ | ✅ | ✅ | ☑️ | ☑️ |

Dealing with `_id`

Another suggested option would be to use alias="_id" on Pydantic model:

class MongoModel(BaseModel):
    class Config(BaseConfig):
        allow_population_by_field_name = True  # << Added
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

class User(MongoModel):
    id: OID = Field(alias="_id")  # << Notice alias
    name: str = Field()


@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.dict(by_alias=True))  # << Inserting as dict with aliased fields
    assert res.inserted_id == body.id
    return body

Now are able to save to DB using User.id field as _id - that solves 5️⃣.

However, how Swagger and ReDoc show id field as _id, and json that is returned looks like this: {"_id":"5ed803afba6455fd78659988","name":"Joe"}. This is a regression for 3️⃣ and 4️⃣
Now we have:
| 1️⃣ | 2️⃣ | 3️⃣ | 4️⃣ | 5️⃣ | 6️⃣ |
|----|----|----|----|----|----|
| ✅ | ✅ | ☑️ | ☑️ | ✅️ | ☑️ |

Hacking our way through

We can do some extra coding to keep id field and make proper inserting into DB. Effectively, we're shuffling id and _id field in MongoModel upon dumping/loading.

class MongoModel(BaseModel):

    class Config(BaseConfig):
        allow_population_by_field_name = True
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }

    @classmethod
    def from_mongo(cls, data: dict):
        """We must convert _id into "id". """
        if not data:
            return data
        id = data.pop('_id', None)
        return cls(**dict(data, id=id))

    def mongo(self, **kwargs):
        exclude_unset = kwargs.pop('exclude_unset', True)
        by_alias = kwargs.pop('by_alias', True)

        parsed = self.dict(
            exclude_unset=exclude_unset,
            by_alias=by_alias,
            **kwargs,
        )

        # Mongo uses `_id` as default key. We should stick to that as well.
        if '_id' not in parsed and 'id' in parsed:
            parsed['_id'] = parsed.pop('id')

        return parsed

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())  # << Notice that we should use `User.mongo()` now.
    assert res.inserted_id == body.id
    return body

This brings back documentation and proper output and solves the insertion:
| 1️⃣ | 2️⃣ | 3️⃣ | 4️⃣ | 5️⃣ | 6️⃣ |
|----|----|----|----|----|----|
| ✅ | ✅ | ✅️ | ✅️ | ✅️ | ☑️ |

Looks like we're getting closer...

Fetching docs from DB

Now, let's try to fetch doc from DB and return it:

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())  # << Notice that we should use `User.mongo()` now.
    assert res.inserted_id == body.id

    found = col.find_one({'_id': res.inserted_id})
    return found

    """
    pydantic.error_wrappers.ValidationError: 1 validation error for User
    response -> id
      field required (type=value_error.missing)
    """

The workaround for this is to use User.from_mongo:

@app.post('/me', response_model=User)
def save_me(body: User):
    assert isinstance(body.id, ObjectId)
    res = db.insert_one(body.mongo())
    assert res.inserted_id == body.id

    found = col.find_one({'_id': res.inserted_id})
    return User.from_mongo(found)  # << Notice that we should use `User.from_mongo()` now.

This seem to cover fetching from DB. Now we have:
| 1️⃣ | 2️⃣ | 3️⃣ | 4️⃣ | 5️⃣ | 6️⃣ |
|----|----|----|----|----|----|
| ✅ | ✅ | ✅️ | ✅️ | ✅️ | ✅️ |

Conclusion and questions

Under the spoiler one can find final code to make FastApi work with mongo in the most "native" way:

Full code

```python
class OID(str):
@classmethod
def __get_validators__(cls):
yield cls.validate

@classmethod
def validate(cls, v):
    try:
        return ObjectId(str(v))
    except InvalidId:
        raise ValueError("Not a valid ObjectId")

class MongoModel(BaseModel):

class Config(BaseConfig):
    allow_population_by_field_name = True
    json_encoders = {
        datetime: lambda dt: dt.isoformat(),
        ObjectId: lambda oid: str(oid),
    }

@classmethod
def from_mongo(cls, data: dict):
    """We must convert _id into "id". """
    if not data:
        return data
    id = data.pop('_id', None)
    return cls(**dict(data, id=id))

def mongo(self, **kwargs):
    exclude_unset = kwargs.pop('exclude_unset', True)
    by_alias = kwargs.pop('by_alias', True)

    parsed = self.dict(
        exclude_unset=exclude_unset,
        by_alias=by_alias,
        **kwargs,
    )

    # Mongo uses `_id` as default key. We should stick to that as well.
    if '_id' not in parsed and 'id' in parsed:
        parsed['_id'] = parsed.pop('id')

    return parsed

class User(MongoModel):
id: OID = Field()
name: str = Field()

@app.post('/me', response_model=User)
def save_me(body: User):
assert isinstance(body.id, ObjectId)
res = db.insert_one(body.mongo())
assert res.inserted_id == body.id

found = col.find_one({'_id': res.inserted_id})
return User.from_mongo(found)

```

And the list of things that are sub-optimal with given code:

One can no longer return any data and expect FastApi to apply response_model validation. Have to use User.from_mongo with every return. This is somewhat a code duplication. Would be nice to get rid of this somehow
The amount of "boilerplate" code needed to make FastAPI work "natively" with mongo is quite significant and it's not that straightforward. This can lead to potential errors and raises entry bar for someone who wants to start using FastAPI with mongo
There is still this duality, where in models one uses id field, while all mongo queries are built using _id. Afraid there is no way to get rid of this though... (I'm aware that MongoEngine and other ODM engines cover this, but specifically decided to stay out of this subject and focus on "native" code)

question

Source

mclate

👍25 ❤19 🚀2 🎉1

Most helpful comment

Firstly, nice work! As you said, this is a fully working solution to using MongoDB with FastAPI that I'm sure will benefit people going forward.

I would highly recommend that if this is to become the "recommended" way of working with MongoDB that we recommend an ODM (object-document-mapper) and show any potential issues with using those with Pydantic/FastAPI. The main reasons are:

It would fall more in line with most of the examples in the docs (e.g. SQLAlchemy). Most FastAPI examples with response models show returning ORM-like objects. An ODM is the natural translation for Mongo.
It resolves the 3 sub-optimal points you mentioned above.
It encourages using objects instead of dicts for all code- which allows type annotations and editor completion.

The existing ODMs are not great. I don't think any of the major ones include type annotations or bulk write support. But they are fairly lightweight and get us most of the way there, and allow you to reach down into raw Mongo queries when you need to. I think if we're going to put some development effort into making Mongo easier to use with Pydantic/FastAPI, it would be best spent writing docs that are as accessible as possible and maybe contributing to existing ODMs to clear up any sticking points.

Obviously ODMs can be a contentious topic, but so can ORMs and FastAPI does not shy away from showing them as the easier way to get started. I think in an ideal world, we'd include the more straight forward "here's an ODM, point and click" approach first and the more advanced "DIY" approach after for people who want to wander into the deep end.

dbanty on 7 Jun 2020

👍4 🚀1 ❤1

All 12 comments

Firstly, nice work! As you said, this is a fully working solution to using MongoDB with FastAPI that I'm sure will benefit people going forward.

It would fall more in line with most of the examples in the docs (e.g. SQLAlchemy). Most FastAPI examples with response models show returning ORM-like objects. An ODM is the natural translation for Mongo.
It resolves the 3 sub-optimal points you mentioned above.
It encourages using objects instead of dicts for all code- which allows type annotations and editor completion.

dbanty on 7 Jun 2020

👍4 🚀1 ❤1

Correct me if I'm wrong but isn't the real key missing part for all of this the serializers/deserializers/validators for all the Mongo/Bson datatypes in Pydantic. If Pydantic added support for all the extra datatypes then you could just return a MongoEngine instance directly no?

mdgilene on 11 Jun 2020

👍1

Interested to know how people are handling creation of indexes in mongo db . Does anyone know of suitable way to define index on a Pydantic model?

leonh on 24 Jul 2020

Here Comes!

I gived up the json_encoder in the fastapi, and developed a more handy one, specialized for mongodb.

Keep in mind that if you has one _id field in the document, the mongodb won't generate one ObjectID.

So it's better that we always generate our own _id.

# -*- coding: utf-8 -*-
# -----------------------------------
# @CreateTime   : 2020/7/25 0:27
# @Author       : Mark Shawn
# @Email        : [email protected]
# ------------------------------------

import json
from datetime import datetime, date
from uuid import UUID
from bson import ObjectId
from pydantic import BaseModel


def mongo_json_encoder(record: [dict, list, BaseModel]):
    """
    This is a json_encoder designed specially for dump mongodb records.

    It can deal with both record_item and record_list type queried from mongodb.

    You can extend the encoder ability in the recursive function `convert_type`.

    I just covered the following datatype: datetime, date, UUID, ObjectID.

    Contact me if any further support needs.

    Attention: it will change the raw record, so copy it before operating this function if necessary.

    Parameters
    ----------
    **record**: a dict or a list, like the queried documents from mongodb.

    Returns
    -------
    json formatted data.
    """

    def convert_type(data):
        if isinstance(data, (datetime, date)):
            # ISO format: data.isoformat()
            return str(data)
        elif isinstance(data, (UUID, ObjectId)):
            return str(data)
        elif isinstance(data, list):
            return list(map(convert_type, data))
        elif isinstance(data, dict):
            return mongo_json_encoder(data)
        try:
            json.dumps(data)
            return data
        except TypeError:
            raise TypeError({
                "error_msg": "暂不支持此类型序列化",
                "key": key,
                "value": value,
                "type": type(value)
            })

    # add support for BaseModel
    if isinstance(record, BaseModel):
        return mongo_json_encoder(record.dict(by_alias=True))
    elif isinstance(record, dict):
        for key, value in record.items():
            record[key] = convert_type(value)
        return record
    else:
        return list(map(mongo_json_encoder, record))


def mongo_json_encoder_decorator(func):
    """
    this is a decorator for converting the queried documents from mongodb

    Parameters
    ----------
    func

    Returns
    -------

    """
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        return mongo_json_encoder(res)
    return wrapper

and the test script is passed as the following:

# -*- coding: utf-8 -*-
# -----------------------------------
# @CreateTime   : 2020/7/25 0:47
# @Author       : Mark Shawn
# @Email        : [email protected]
# ------------------------------------

import uuid
from uuid import UUID
from bson import ObjectId
from typing import  List, Union
from pydantic import BaseModel, Field
from utils.json import mongo_json_encoder


class FriendBase(BaseModel):
    class Config:
        arbitrary_types_allowed = True
        allow_population_by_field_name = True

    id: Union[str, UUID, ObjectId] = Field(alias='_id')
    name: str


class Friend(FriendBase):
    friends: List[FriendBase] = []


f_1 = Friend(id='test', name='test')
f_2 = Friend(id=uuid.uuid1(), name='test', friends=[f_1])
f_3 = Friend(id=ObjectId(), name='test', friends=[f_1, f_2])

i_1 = f_1.dict(by_alias=True)
i_2 = f_2.dict(by_alias=True)
i_3 = f_3.dict(by_alias=True)

j_1 = mongo_json_encoder(i_1.copy())
j_2 = mongo_json_encoder(i_2.copy())
j_3 = mongo_json_encoder(i_3.copy())
j_all = [f_1, f_2, f_3]

assert i_1 == j_1
assert i_2 == j_2, "this should not pass"
assert i_3 == j_3, "this should not pass"

It just runs well!

MarkShawn2020 on 24 Jul 2020

👀2

I hope @tiangolo would adapt FastAPI to have less boilerplate code when using MongoDB.. This would be fantastic.

raedkit on 12 Oct 2020

👍2

I recently wrote ODMantic to ease the integration of FastAPI/Pydantic and MongoDB.
Basically, it completely bundle all the boilerplate code required to work with Mongo and you can still perform raw mongo queries on top of the one brought by the ODM engine.

There is a FastAPI example in the documentation if you wanna have a look :smiley:

art049 on 10 Nov 2020

👍5

@art049 Hey, this looks very promising as an all in one solution the problems discussed in this thread. Would be great to get some buy in from the major players the Python world and see the project grow more mature. I'm always hesitant pulling in relatively new libraries (looks like your project is ~6-7months old) especially into production code until it is proven to be relatively mature and well maintained. Either way, this does look like it address pretty much all of this issues people have brought up. Looking forward to how this progresses.

mdgilene on 10 Nov 2020

❤1

Dumb question

Doesn't this problem go away if you just allow your mongo engine to auto place _id on your models, and then use that instead of id ?

NomeChomsky on 25 Nov 2020

This was my solution - just define a new 'out' schema with 'id' on there, then set that it to '_id' from the object which comes out of the database on a query. It allowed me to use a standard response model,


class UserBase(BaseModel):
    id: Optional[PyObjectId] = Field(alias='_id')
    username: str

class UserOut(UserBase):
    id: Optional[PyObjectId]

@core.get('/user', response_model=users.UserOut)
async def userfake() -> users.UserFull:
    user = UserBase()
    result = await mdb.users.insert_one(user.dict())
    in_db = await mdb.users.find_one({'_id': result.inserted_id})
    out_db = users.UserOut(**in_db)
    out_db.id = in_db['_id']
    return out_db

NomeChomsky on 25 Nov 2020

👍1

I actually improved on that slightly so it kinda 'just works'.

I've created a MongoBase and a MongoOut schema, which you could subclass for all other outward data. This way, our Alias allows us to write to _id on the way in, and out MongoOut schema reworks the data on the way back out. The MongoOut class should always be the class you inherit from first - it won't work the other way. It eliminates the need for those messy lines above.

class MongoBase(BaseModel):
    id: Optional[PyObjectId] = Field(alias='_id')

    class Config(BaseConfig):
        orm_mode = True
        allow_population_by_field_name = True
        json_encoders = {
            datetime: lambda dt: dt.isoformat(),
            ObjectId: lambda oid: str(oid),
        }


class MongoOut(MongoBase):
    id: Optional[PyObjectId]

    def __init__(self, **pydict):
        super().__init__(**pydict)
        self.id = pydict.pop('_id')

class UserOut(MongoOut, UserBase):
    pass

@core.get('/user', response_model=users.UserOut)
async def userfake():
    user = fake_user()
    result = await mdb.users.insert_one(user.dict())
    in_db = await mdb.users.find_one({'_id': result.inserted_id})
    return in_db

NomeChomsky on 26 Nov 2020

Sorry to add more, hopefully this is useful. The hackiest but simplest solution I've found is below - you don't actually need the alias when using motor engine. Motor automatically adds ObjectID to every object if its not there, so you can actually drop the MongoOut and have one simple MongoBase which populates id with _id at initialisation :

class MongoBase(BaseModel):
    id: Optional[PyObjectId]

    class Config(BaseConfig):
        orm_mode = True
        allow_population_by_field_name = True
        json_encoders = {
            datetime: datetime.isoformat,
            ObjectId: str
        }

    def __init__(self, **pydict):
        super().__init__(**pydict)
        self.id = pydict.get('_id')

class UserBase(MongoBase):
    username: str
    email: str = None
    first_name: str = None
    last_name: str = None

@core.get('/user', response_model=users.UserBase)
async def userfake():
    user = fake_user()
    result = await mdb.users.insert_one(user.dict())
    in_db = await mdb.users.find_one({'_id': result.inserted_id})
    return in_db

The downside (is it a downside?) is that in the DB there's a redundant 'id' which isn't being used. Below is what in_db looks like before its put back into UserBase(MongoBase). However, in_db['_id'] is equal to the out_db.id object, and the swagger docs are all correct....

{'_id': ObjectId('5fb9f4c00d1263cc1555d197'), 'id': None, 'username': 'Denise Garcia', }

NomeChomsky on 26 Nov 2020

👀1

@NomeChomsky I use a mixin that works in a similar way.

class DBModelMixin(BaseModel):
    id: Optional[ObjectIdStr] = Field(..., alias="_id")

    class Config:
        json_loads = json_util.loads
        json_dumps = json_util.dumps
        allow_population_by_field_name = True
        json_encoders = {ObjectId: lambda x: str(x)}

with pydantic classes like this...

class Item(BaseModel):
    name: str = Field(..., max_length=250)

class ItemDB(Item, DBModelMixin):
    pass

leonh on 26 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Creating Enum class from dictionary

zero0nee · 3Comments

[QUESTION] Creating endpoint to convert XML into JSON

mr-bjerre · 3Comments

[QUESTION]: pytest migrating from Flask: app.config

scheung38 · 3Comments

[BUG] Error with http3 AsyncClient

updatatoday · 3Comments

How to use oauthlib.oauth2 to get access_token using fastapi project generator

rlonka · 3Comments