Pydantic: from_orm() should detect cycles when loading SQLAlchemy bi-directional relationships rather than blow stack

Created on 14 Jul 2019 · 19Comments · Source: samuelcolvin/pydantic

Bug

For bugs/questions:

OS: Linux
Python version import sys; print(sys.version): .6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) [GCC 7.3.0]
Pydantic version import pydantic; print(pydantic.VERSION): 0.30

Where possible please include a self contained code snippet describing your
bug, question, or where applicable feature request:

import sys
from sqlalchemy import (
    Column,
    ForeignKey,
    Integer,
    create_engine
    )
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from pydantic import BaseModel
from typing import List

Base = declarative_base()


# SQLA Model
class TableOne(Base):
    __tablename__ = 'table_one'
    # primary key
    one_id = Column(Integer, primary_key=True)
    # list of MANY
    table_many = relationship(
        'TableMany',
        back_populates='table_one')


class TableMany(Base):
    __tablename__ = 'table_many'
    # primary key
    many_id = Column(Integer, primary_key=True)
    # foriegn key
    one_id = Column(Integer, ForeignKey("table_one.one_id"))
    # reference to ONE
    table_one = relationship(
        'TableOne',
        back_populates='table_many'
    )


# pydantic schema reflecting SQLA Model
class t1(BaseModel):
    one_id: int
    # list of children
    table_many: List['tm']

    class Config:
        orm_mode = True


class tm(BaseModel):
    many_id: int
    one_id: int
    # reference to parent
    table_one: t1

    class Config:
        orm_mode = True


def main(argv=None):
    """

    from_orm() goes into infinite recursion
    loading bi-directional SQLA relationships

 File "pydantic/main.py", line 642, in pydantic.main.validate_model
 File "pydantic/fields.py", line 296, in pydantic.fields.Field.validate
 File "pydantic/fields.py", line 429, in pydantic.fields.Field._validate_singleton
 File "pydantic/fields.py", line 436, in pydantic.fields.Field._apply_validators
 File "pydantic/class_validators.py", line 202, in pydantic.class_validators._generic_validator_basic.lambda11
 File "pydantic/main.py", line 452, in pydantic.main.BaseModel.validate
 File "pydantic/main.py", line 373, in pydantic.main.BaseModel.from_orm
 File "pydantic/main.py", line 642, in pydantic.main.validate_model
 File "pydantic/fields.py", line 296, in pydantic.fields.Field.validate
 File "pydantic/fields.py", line 429, in pydantic.fields.Field._validate_singleton
 File "pydantic/fields.py", line 436, in pydantic.fields.Field._apply_validators
 File "pydantic/class_validators.py", line 202, in pydantic.class_validators._generic_validator_basic.lambda11
 RecursionError: maximum recursion depth exceeded while calling a Python object

    """
    if argv is None:
        argv = sys.argv

    db_url = 'put-a-db-here'
    engine = create_engine(db_url, echo=False)
    dbsession = sessionmaker(bind=engine)
    session = dbsession()

    t1.update_forward_refs()
    tm.update_forward_refs()

    qry = session.query(TableOne).first()
    res = None
    if qry:
        res = t1.from_orm(qry)
    print(res)


if __name__ == '__main__':
    sys.exit(main())

feature request help wanted

Source

dand-oss

Most helpful comment

@dand-oss

You are probably requesting a cycle breaking feature.

I can imagine a case where the correct behavior wouldn't be obvious:

A person with a list of items, each item with a list of owners. If the model is serializing the person, and then it serializes the items, for each item, when serializing the owners, should it exclude the original owner from the list but include all the other owners? And make it look for that individual item as if the owner was not an owner? Or should it not include any owner at all, making it look as if the other owners didn't exist?

Pydantic and FastAPI are open source projects, most of us give lots of our free time to support these tools and ideas, for free, just to help others. And you can benefit and even get profit from them, for free.

And we stay around here trying to help others the same way.

You are using a very aggressive and derogatory language. It is not appreciated. And it moves your issue/request to the bottom of my priority list.

tiangolo on 16 Jul 2019

👍23 ❤11

All 19 comments

This would be expected. You are declaring two SQLAlchemy models that reference each other. And then you're declaring 2 Pydantic models that reference each other.

They will just try to generate an infinite tree of models.

You would have to remove one of the circular references.

Another way to think about it is to ask yourself, what would you expect to happen? How would you expect the models to be created?

tiangolo on 14 Jul 2019

Some consider Blowing the stack to exemplify poor library craftsmanship.

I agree using all memory does inform the library consumer to rewrite his database model limited as a Directed Acyclic Graph.

Alternatively, the from_orm() author could detect cycles when loading attributes which reference pydantic instances, and thus

from_orm() could handle ANY working db model.

ex: Detecting cycles in graph traversals
https://www.geeksforgeeks.org/graph-data-structure-and-algorithms/

Then instead of exploding on take-off the the API model would have

"table one" instance with a list of "table many" rows instances
and each "table many" instance would have a reference back to "table one" instance

SQLAchemy recommends bidirectional relationships, and indeed loads them into "cyclic" member references.

https://docs.sqlalchemy.org/en/13/orm/basic_relationships.html

To establish a bidirectional relationship in one-to-many, where the “reverse” side is a many to one, specify an additional relationship() and connect the two using the relationship.back_populates parameter:

https://docs.sqlalchemy.org/en/13/orm/backref.html

dand-oss on 15 Jul 2019

Another way to think about it is to ask yourself, what would you expect to happen? How would you expect the models to be created?

You really don't know...? Perhaps FastAPI works only with toy applications.

Marshmallow handles the problem correctly out of the box.

https://marshmallow-sqlalchemy.readthedocs.io/en/latest/

Replace the Pydantic model with "model ="

class t1_mm_schema(ModelSchema):
    class Meta:
        model = TableOne

class tm_mm_schema(ModelSchema):
    class Meta:
        model = TableMany



md5-7e8c6b28b66c99c9ecf759f64d1207f9



    class _models(object):
        def __init__(self):
            self.Course = Course
            self.School = School
            self.Student = Student
            self.Teacher = Teacher
            self.SubstituteTeacher = SubstituteTeacher
            self.Paper = Paper
            self.GradedPaper = GradedPaper
            self.Seminar = Seminar
            self.Lecture = Lecture
            self.Keyword = Keyword

dand-oss on 15 Jul 2019

👎3

Some consider Blowing the stack to exemplify poor library craftsmanship.

Others consider it a demonstration of a limited understanding of the proper use of a library and its capabilities.

Alternatively, the from_orm() author could detect cycles when loading attributes which reference pydantic instances, and thus from_orm() could handle ANY working db model.

There are a number of problems with this:

1) I'm not aware of any standard way to serialize a cyclic object reference in the context of json objects and/or a json schema. This is a strong indication to me that what you are trying to accomplish is not a task for which pydantic is best suited.

In particular, given pydantic's emphasis on parsing, I think it is probably not a good idea to try to use pydantic to represent an object graph that is not a DAG.

2) In order to detect cyclic references, you'd need to maintain some sort of object graph.

If you look at the current from_orm implementation, it essentially just uses getattr to treat the object like a dict for parsing. Pydantic currently does nothing to track an object's identity while parsing.

Implementing relationship-resolution would be a substantial feature to develop, and would likely require tweaks for each additional ORM backend (note that your marshmallow example is specific to sqlalchemy). I would view this as outside the scope of pydantic, though @samuelcolvin may feel differently.

3) In addition to the maintenance burden of this relatively complex feature, it would likely have negative performance implications for the from_orm method due to needing to track references when parsing a deeply nested model. The capability to track cyclic references would be a much lower priority for me than performance implications.

4) In general, if you are not parsing and/or serializing data, then pydantic may be the wrong tool. If you are, then how are your cyclic references represented? It seems like you are misinterpreting the scope of pydantic -- from_orm is merely intended as a convenience method where appropriate.

You really don't know...? Perhaps FastAPI works only with toy applications.

What would cyclic references in pydantic help you accomplish with fastapi? It sounds like you don't really understand how pydantic is intended to be used with fastapi. In particular, how would you send an object with a cyclic reference to a fastapi endpoint, or have fastapi serialize it as a response?

If you need additional capabilities, you should probably be using another library to manage your objects away from the API interface. If you are using sqlalchemy, and want cyclic references, why not just keep the objects as sqlalchemy objects?

dmontagu on 15 Jul 2019

👍1

Can FastAPI serve their unit test db model without catching fire?

What do you mean by "serve"? I don't see anywhere in the file you linked that a model with a cyclic reference is being serialized. Again, it's not even clear how such a model should be serialized.

dmontagu on 15 Jul 2019

@dand-oss

You are probably requesting a cycle breaking feature.

I can imagine a case where the correct behavior wouldn't be obvious:

And we stay around here trying to help others the same way.

You are using a very aggressive and derogatory language. It is not appreciated. And it moves your issue/request to the bottom of my priority list.

tiangolo on 16 Jul 2019

👍23 ❤11

Completely agree with @tiangolo.

I'll be ignore this until someone comes back and provides a constructive approach for solving it.

samuelcolvin on 19 Jul 2019

👍5

I wish this guy wasn't a dick, that was very dumb. OSS people literally make my life livable.
Thank you for this library, it is very cool, and I almost can't believe it's a thing in python. You guys are some slick python devs.

Even though this guy was a dipshit, what he's asking for would be a huge help in the sqlalchemy world.

I have a shitty, shitty, shitty, old version of the Model.dict for serializingsqlalchemy models.
That code has a ton of hacks to get around cyclical issues. And it is so depressing when someone at work using my code runs into a RecursionError: maximum recursion depth exceeded and I have to explain i'm too dumb to figure out how to make it never happen again. Especially because on the surface to people, serialization is so simple.

I just learned about pydantic and because of how fuckin amazing it is all around, I went to look how you guys were doing this hard problem.
Sadly (for me) it looks like this isn't a problem you guys have ever needed to solve or I'm sure you would have already. Then I could once again, benefit from the OSS community.

Anyway, I only wrote this because I can add a little info on how to do it if it ever becomes a priority in pydantic.

The native python deepcopy can handle this by memorizing types. I haven't figured out how exactly, but that would probably help.

j-walker23 on 7 Nov 2019

👍2

@j-walker23 thanks for the positive feedback!

I think it should be possible to write a model for which from_orm won't cause recursion errors, you just have to manually override the from_orm method.

This may not be able to handle all cases (e.g., if you have variable length cycles), but I think the vast majority of cases where this would be helpful involve a cycle of length 2 (a basic relationship), and that could be handled inside from_orm.

(In v1.0 there is even some added functionality to make it easier to override the from_orm method -- namely, the ability to use a custom GetterDict -- but I'm not sure whether it would be useful for this.)

However, even if loading the model didn't cause recursion errors, you'd still run into recursion errors when you tried to convert it to a dict (or json). That could also be addressed through method overriding though, just with different methods involved (probably .dict would be enough). You could also prevent recursion issues during serialization by carefully choosing fields to exclude.

At the end of the day, the logic and apis necessary to handle cycles during loading (i.e., from_orm) and serialization would likely require adding substantial overhead (not to mention complexity). Given 1) how infrequent a request this is, and 2) the fact that this use case can be handled now through method overrides, I don't think this currently merits additional feature work.

That said, if anyone really cares about this and wants to push for better handling of cyclic references (especially if you can describe a desired implementation), I think it might be best to just create a new issue.

dmontagu on 7 Nov 2019

Thanks for the response.

That totally makes sense. I haven't noticed ever needing this outside of
ORM's, and I would use pydantic for a million other things outside of the
ORM and it will work perfectly.

It's not super important, but I didn't understand the part about cycles.
Was that your way of saying the memorization techniques of deepcopy
wouldn't work, or were you just referencing what was talked about earlier
in this issue?

If I ever do figure this out for myself, I'll definitely post something
about it.

Thanks again.

On Wed, Nov 6, 2019 at 7:59 PM dmontagu notifications@github.com wrote:

@j-walker23 https://github.com/j-walker23 thanks for the positive
feedback.

I think it should be possible to write a model for which from_orm won't
cause recursion errors, you just have to manually override the from_orm
method.

This may not be able to handle all cases (e.g., if you have variable
length cycles), but I think the vast majority of cases where this would be
helpful involve a cycle of length 2 (a basic relationship), and that
could be handled inside from_orm.

(In v1.0 there is even some added functionality to make it easier to
override the from_orm method -- namely, the ability to use a custom
GetterDict -- but I'm not sure whether it would be useful for this.)

However, even if loading the model didn't cause recursion errors, you'd
still run into recursion errors when you tried to convert it to a dict (or
json). That could also be addressed through method overriding though, just

with different methods involved (probably .dict would be enough).

At the end of the day, the logic and apis necessary to handle cycles
during loading (i.e., from_orm) and serialization would likely require
adding substantial overhead (not to mention complexity). Given 1) how
infrequent a request this is, and 2) the fact that this use case can be
handled now through method overrides, I don't think this currently merits
additional feature work.

That said, if anyone really cares about this and wants to push for better
handling of cyclic references (especially if you can provide a desired
implementation), I think it might be best to just create a new issue.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/samuelcolvin/pydantic/issues/659?email_source=notifications&email_token=AAPM4RPVILL6ZIDORU7ZH5TQSOABFA5CNFSM4IDQFQEKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIXRBY#issuecomment-550598791,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAPM4RO7LRUAQJSPVCJO6U3QSOABFANCNFSM4IDQFQEA
.

j-walker23 on 7 Nov 2019

Could this be solved by allowing from_orm to accept an exclude in the same format as with the BaseModel.json() and BaseModel.dict() methods?

Example, using sqlalchemy orm:


class Foo(BaseModel):
    bar: 'Bar'
    id: int


class FooORM(StorageModel):
   bar_id = Column(ForeignKey('BarORM.id'))
   bar = relationship('BarORM')


class Bar(BaseModel):
  foo: Foo
  id: int

class BarORM(StorageModel):
    foo_id = Column(ForeignKey('FooORM.id'))
    foo = relationship('FooORM')

# . . .

foo = Foo.from_orm(foo_orm, exclude={'bar': {'foo'}})

foo.dict()
# > { 'id': 1, 'bar': {'id': 2}}

In all honesty I just expected this to work out of the box after doing some testing with recursion on the json and dict methods.

tomthorogood on 19 Dec 2019

exclude on from_orm would be fine by me, but I think it's more of a work around than an actual fix.

samuelcolvin on 19 Dec 2019

👍1

The more I think about it, the less I like myo wn proposal, and agree with @dmontagu that the feature really seems out of scope for pydantic.

My intent on using something like this was to have all the context for a particular record, including all of its relationships' context, and . . . well, at some point i'm just returning the entire database under some particular scope.

That seems . . . unwise.

tomthorogood on 19 Dec 2019

I just ran into this problem. Here's my current workaround:

from __future__ import annotations

from pydantic import BaseModel

class ModelAChild(BaseModel):
    id: int

    class Config:
        orm_mode = True

class ModelA(ModelAChild):
    model_b: ModelBChild

class ModelBChild(BaseModel):
    id: int

    class Config:
        orm_mode = True

class ModelB(ModelBChild):
    model_a: ModelAChild

ModelA.update_forward_refs()

richard-to on 29 Dec 2019

👍3

@richard-to Thanks for your demo code. And my workaround for many to many relationship based on yours:

from __future__ import annotations
from typing import List
from pydantic import BaseModel


class UserAlone(BaseModel):
    id: int = None
    name: str

    class Config:
        orm_mode = True

class User(UserAlone):
    roles: List['RoleAlone']= []

class RoleAlone(BaseModel):
    id: int = None
    name: str

    class Config:
        orm_mode = True

User.update_forward_refs()

class Role(RoleAlone):
    users: List[UserAlone]

@tiangolo If the above workaround looks good to you, maybe it can be included somewhere in the FastAPI's excellent document.

shizidushu on 13 Apr 2020

👀1

I have a (tried and tested) solution that might or might not be applicable here, but it did solve the cyclic json problem for me, but it also requires some identifier to be stored in the json. It looks (simplified) like this:

import json
from datetime import datetime, date

class Jsonifiable(object):

    # some standard conversions
    json_encoders = {
        datetime: str,
        date: str,
        # ....
    }


    @classmethod
    def get_type(cls, attr_name):
        return "the type of the attr, from pydantic model, descriptor, dataclass"

    @classmethod
    def from_json_dict(cls, json_dict, _reg=None, _get_id=lambda js: js['id']):
        """ _reg keeps al the id's already seen """
        _reg = _reg or {}
        id_ = _get_id(json_dict)
        if id_ in _reg:  # already seen: stop recursion
            return _reg[id_]  # return already seen and parsed

        kwargs = {}
        for name, js_value in json_dict.items():
            attr_type = cls.get_type(name)
            try:  # could also check the type, do hasattr(...)
                kwargs[name] = attr_type.from_json(js_value, _reg, _get_id)  # pass _reg
            except AttributeError:
                try:
                    kwargs[name] = attr_type(js_value)
                except (TypeError, ValueError):
                    kwargs[name] = js_value

        # kwargs can still be validated in constructor
        obj = cls(**kwargs)
        _reg[id_] = obj
        return obj

    @classmethod
    def from_json(cls, js):
        return cls.from_json_dict(json.loads(js))

    def __init__(self, **kwargs):
        for name, value in kwargs.items():
            setattr(self, name, value)

    def to_json_dict(self, _reg=None, _get_id=id):
        _reg = _reg or set()
        id_ = _get_id(self)
        if id_ in _reg:  # already encoded
            return {'id': id_}  # return a json_dict with only id; stops recursion
        _reg.add(_get_id(self))

        json_dict = {'id': id_}  # make json recognizable for parsing!
        for name, value in self.__dict__.items():
            try:
                json_dict[name] = value.to_json(_reg, _get_id)
            except AttributeError:
                self.json_encoders[type(value)](value)

        return json_dict

    def to_json(self):
        return json.dumps(self.to_json_dict())

The main difference is that instead of building a graph, you just recusively pass _reg: a set/dict keepingtrack of json/instances already seen. This should be simpler and a lot faster.

I think storing the identifier in the json is inevitable, if you want to be able to restore any cyclic datastructure.

Hope it helps.

gemerden on 22 Jun 2020

🎉2

Not to +1 this issue, but I'm running into this also. @richard-to 's workaround works fine, but it's pretty ugly. To @dand-oss 's point, this should absolutely be fixed in the ORM instead of blowing the stack when a self-reference happens.

Edit: I would recommend a recursion limit or implement a Marshmallow-like solution (TableMany or TableOne) in the Config of each model

t94j0 on 22 Jun 2020

❤1

+1 on not blowing the stack when a self-reference happens. A Marshmallow like solution would be great.

gulfpearl on 28 Jul 2020

👍1

I am late to this debate and my understanding of the internals of from_orm is low, so please, bare with me.

My understanding of the problem is that ORM (and data modelling in general) easily accommodate(s) bi-directionnal relationships whereas pydantic cannot cope with it.
This is unfortunate because bi-directional relationships in very convenient during development.

Mathematics tell us there is no solution to that problem as it is impossible to map a cyclic graph to a tree (or a DAG)
So, if we want to come up with something, we have to accept to make some trade-offs.

The current state is not very clean (as @dand-oss correctly pointed out (but, indeed, not kindly)): letting Python manage the error at low level through infinite loop detection is not very kind to developers who love the pydantic library as I do.

So the very first trade-off we could make would be to send a more civilised error message, related to the discovery of the cycle in the model.

But then, the second level trade-off would not be very far away: upon discovering the cycle, a simple behaviour would just be to silently return with no error.

I admit this is where my low understand of from_orm hurts as I have no idea why pydantic requires to traverse the model, thereby blowing up the stack.

Please let me know why pydantic insists on fully traversing the model: I may then contribute to a solution.

Thank you.