Pydantic: Model's dict method that contains set of models.

Created on 9 Dec 2019  路  8Comments  路  Source: samuelcolvin/pydantic

Bug

Please complete:

  • OS: 19.10
  • Python version import sys; print(sys.version): 3.6.8 [GCC 9.2.1 20191008]
  • Pydantic version import pydantic; print(pydantic.VERSION): 1.2

I want to create model Foo, that contains collection of Bars:

from typing import Set, Tuple, List
from pydantic import BaseModel


class Bar(BaseModel):
    bar: str = 'bar_value'

    def __hash__(self):
        return self.bar.__hash__()


class FooTuple(BaseModel):
    bars: Tuple[Bar]


class FooSet(BaseModel):
    bars: Set[Bar]


class FooList(BaseModel):
    bars: List[Bar]


if __name__ == '__main__':
    FooList(bars=[Bar()]).dict()
    FooTuple(bars=(Bar(),)).dict()
    FooSet(bars={Bar()}).dict()

...

Dict method recursively converts models to dictionaries and tries to put them in the same containers they were in. It falls down for Set because of dict type is unhashable.

feature request

Most helpful comment

We just ran into this same issue, and wanted to share our thoughts (for the core devs and any future finders of this issue).

  1. On a possible general solution:
    The issue stems from this line:
    https://github.com/samuelcolvin/pydantic/blob/5510a13f6ce2b3909aced8e1ae6a23fea4484862/pydantic/main.py#L598
    in
class BaseModel(metaclass=ModelMetaclass):
    ...
    def _get_value(
        ...
        elif isinstance(v, (list, set, tuple)):
            return type(v)(
            ...

specifically when v is a set and that set contains base model(s) which are then exported into a dict and thus the unhashable in a set issue arrises.
Our solution to this would be to, in the case in which v is an instance of set, instead of using type(v) instead use list, i.e. instead of exporting a set simply export a list. This avoids the need to have hashable items.
It would create bigger information loss in the export process, and could/would in some cases be a "gotcha" to users in the case that they explicitly needed a set to be returned. But importing that export still works correctly as pydantic will cast that list back into a set.
At least in our use cases this would have avoided a lot of overhead and "just worked", but of course that's just us specific.

Note also that there is an extra issue when using a frozenset. In this case (with frozenset not being an instance of set) any values of the frozenset (including base models) are passed "as is", without any further export being triggered on them, see; https://github.com/samuelcolvin/pydantic/blob/5510a13f6ce2b3909aced8e1ae6a23fea4484862/pydantic/main.py#L616
Our suggestion would be to remove set from the L598 case, and create a new elif case which checks for set & frozenset, which then coerces the export to a list.

If this seems reasonable to you I can attempt a PR implementing it. (But as you said, this would strictly speaking introduce an inconsistency in the export process - sets being cast to lists).

  1. On a workaround:
    We did a) add a __hash__ function to our base model to enable it to be an item of a set/frozenset in the first instance, is just uses hash on a serialised version of the model (think dict then json.dumps or something similar).
    Then b) we extended .dict of our base model to look for the specific attributes that are frozenset/set, and if found then kick off the recursive dict sequence on its items;
    def dict(self, *args, **kwargs):
        """Converts instance to dict representation of it."""
        d_dict = super().dict(
            *args,
            **kwargs
        )
        if "frozenset_attribute" in d_dict:
            d_dict["frozenset_attribute"] = [x.dict(*args, **kwargs) for x in d_dict["frozenset_attribute"]]

        return d_dict

We're not super confident about it yet, but using it for the time being.

All 8 comments

Might be possible to fix this but it would take a fundamental change to how models are validated.

I'd be happy to review a PR, but it would need to pass a fairly high bar (no pun intended) of not effecting performance or any backwards incompatibilities.

Yeah, it's not clear to me what the behavior should be instead; I can't think of a way to change this behavior that wouldn't introduce an inconsistency in the dict-making recursion.

I'd probably recommend overriding .dict() on the model that is causing trouble.

If someone has an idea of how this could behave differently (or be easier to override) in a backwards compatible way without just adding another argument to dict, I agree that would be interesting.

Okay, I had misread this. The problem is the .dict(), not the FooSet(bars={Bar()}).

Since I think it's correct that .dict() recursively converts sub-models to dictionaries, and it's a limitation of python that 1) items in a set must be hashable, and 2) dicts are not hashable; I don't think this can really be fixed.

The solution is to use dict(FooSet(bars={Bar()})) and work from there.

We just ran into this same issue, and wanted to share our thoughts (for the core devs and any future finders of this issue).

  1. On a possible general solution:
    The issue stems from this line:
    https://github.com/samuelcolvin/pydantic/blob/5510a13f6ce2b3909aced8e1ae6a23fea4484862/pydantic/main.py#L598
    in
class BaseModel(metaclass=ModelMetaclass):
    ...
    def _get_value(
        ...
        elif isinstance(v, (list, set, tuple)):
            return type(v)(
            ...

specifically when v is a set and that set contains base model(s) which are then exported into a dict and thus the unhashable in a set issue arrises.
Our solution to this would be to, in the case in which v is an instance of set, instead of using type(v) instead use list, i.e. instead of exporting a set simply export a list. This avoids the need to have hashable items.
It would create bigger information loss in the export process, and could/would in some cases be a "gotcha" to users in the case that they explicitly needed a set to be returned. But importing that export still works correctly as pydantic will cast that list back into a set.
At least in our use cases this would have avoided a lot of overhead and "just worked", but of course that's just us specific.

Note also that there is an extra issue when using a frozenset. In this case (with frozenset not being an instance of set) any values of the frozenset (including base models) are passed "as is", without any further export being triggered on them, see; https://github.com/samuelcolvin/pydantic/blob/5510a13f6ce2b3909aced8e1ae6a23fea4484862/pydantic/main.py#L616
Our suggestion would be to remove set from the L598 case, and create a new elif case which checks for set & frozenset, which then coerces the export to a list.

If this seems reasonable to you I can attempt a PR implementing it. (But as you said, this would strictly speaking introduce an inconsistency in the export process - sets being cast to lists).

  1. On a workaround:
    We did a) add a __hash__ function to our base model to enable it to be an item of a set/frozenset in the first instance, is just uses hash on a serialised version of the model (think dict then json.dumps or something similar).
    Then b) we extended .dict of our base model to look for the specific attributes that are frozenset/set, and if found then kick off the recursive dict sequence on its items;
    def dict(self, *args, **kwargs):
        """Converts instance to dict representation of it."""
        d_dict = super().dict(
            *args,
            **kwargs
        )
        if "frozenset_attribute" in d_dict:
            d_dict["frozenset_attribute"] = [x.dict(*args, **kwargs) for x in d_dict["frozenset_attribute"]]

        return d_dict

We're not super confident about it yet, but using it for the time being.

@DBCerigo Thanks for sharing your thoughts!

I see the argument for coercing all sequences to list, but this would be a substantial breaking change to the current API, so would probably have to wait for v2. Moreover, if you view dict as converting things into more of a python-friendly unstructured representation, it feels a little more awkward to convert set to list, so I can see the argument against. But from a practical perspective I think it would be better if this didn't cause errors, so would be fine with creating lists; I'm interested in @samuelcolvin 's opinion here.

@DBCerigo Is there any non-performance reason you prefer the approach here over json.loads(model.json())? Is performance enough of an issue for your use case to merit the added complexity?

I could also be convinced to add a keyword argument to .dict() to coerce sequences to list under all circumstances, assuming it had a minimal impact on performance (which I think should be possible). I'm normally averse to adding flags like that, but I think being unable to call .dict() on models with a Set field is worth handling somehow.

I'm definitely opposed to changing the behaviour to use list instead of set/frozenset. I could perhaps be persuaded of another argument to .dict() but I'm not convinced it's necessary.

The fact that frozenset is omitted from (list, set, tuple) is just a bug and should be fixed.

I think the proper solution here is implement your own logic which iterates through dict(model) and builds the object as you need it, perhaps as another method on your own custom base model.

Good stuff :) @samuelcolvin appreciate the response and the sentiments - similarly think it is enough of a edge-case and such those people needing that functionality will be able to implement what they need themselves.

Glad it helped raise a mini bug fix in any case 馃憤

I just ran into this problem, too. For me, @DBCerigo 's workaround 2 didn't work. So I came up with this ugly thing that simply converts set attributes to lists first.

    def dict(self, *args, **kwargs):
        """Converts instance to dict representation of it."""
        self.set_attribute = list(self.set_attribute)
        return super().dict(
            *args,
            **kwargs
        )

EDIT: That's actually a bad approach since that instance loses its set behaviour. Didn't affect me because I don't use the instance after serialization but it's not a good general solution.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sm-Fifteen picture sm-Fifteen  路  45Comments

MrMrRobat picture MrMrRobat  路  22Comments

demospace picture demospace  路  26Comments

maxrothman picture maxrothman  路  26Comments

jasonkuhrt picture jasonkuhrt  路  21Comments