Pydantic: Does `pydantic.dataclasses.dataclass` support classic mapping in SQLAlchemy?

Created on 9 Dec 2019 · 11Comments · Source: samuelcolvin/pydantic

Question

OS: Fedora 31
Python version: 3.7.5 (default, Oct 25 2019, 15:51:11)
Pydantic version: 1.2

First of all, thank you so much for your awesome job! Pydantic is a very good library and I really like its combination with FastAPI. :rocket:

So my question is does pydantic.dataclasses.dataclass support classic mapping in SQLAlchemy? I am working on a project and hopefully can build it with clean architecture and therefore, would like to use pydantic's dataclass for my domain model and have SQLAlchemy depend on it.

I have a proof of concept code snippet as below, which would work if I use the dataclass from Python's std lib.

from dataclasses import field
from pydantic.dataclasses import dataclass
from pydantic import constr
from typing import List
from datetime import datetime
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, ARRAY, TIMESTAMP
from sqlalchemy.orm import sessionmaker, mapper

metadata = MetaData()
person_table = \
    Table('people', metadata,
          Column('id', Integer, primary_key=True, autoincrement=True),
          Column('name', String),
          Column('age', Integer),
          Column('hobbies', ARRAY(String)),
          Column('birthday', TIMESTAMP)
          )


@dataclass
class Person:
    name: str = constr(max_length=50)
    age: int = field(default_factory=int)
    hobbies: List[str] = field(default_factory=list)
    birthday: datetime = field(default_factory=datetime.now)


mapper(Person, person_table)

engine = create_engine(
    'postgresql://postgres:[email protected]:5432/test_db', echo=True)
metadata.create_all(engine)

session = sessionmaker(bind=engine)()
person = Person(hobbies=['golf', 'hiking'],
                birthday=datetime(1985, 7, 25))
session.add(person)
session.commit()

Error Output:

> python main.py         
2019-12-08 22:11:07,663 INFO sqlalchemy.engine.base.Engine select version()
2019-12-08 22:11:07,663 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,664 INFO sqlalchemy.engine.base.Engine select current_schema()
2019-12-08 22:11:07,664 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine show standard_conforming_strings
2019-12-08 22:11:07,665 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,666 INFO sqlalchemy.engine.base.Engine select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where pg_catalog.pg_table_is_visible(c.oid) and relname=%(name)s
2019-12-08 22:11:07,666 INFO sqlalchemy.engine.base.Engine {'name': 'people'}
2019-12-08 22:11:07,667 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE people (
        id SERIAL NOT NULL, 
        name VARCHAR, 
        age INTEGER, 
        hobbies VARCHAR[], 
        birthday TIMESTAMP WITHOUT TIME ZONE, 
        PRIMARY KEY (id)
)


2019-12-08 22:11:07,667 INFO sqlalchemy.engine.base.Engine {}
2019-12-08 22:11:07,683 INFO sqlalchemy.engine.base.Engine COMMIT
Traceback (most recent call last):
  File "/home/xzhan/Development/Projects/sqlalchemy_mapping/.venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1955, in add
    state = attributes.instance_state(instance)
AttributeError: 'Person' object has no attribute '_sa_instance_state'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 37, in <module>
    session.add(person)
  File "/home/xzhan/Development/Projects/sqlalchemy_mapping/.venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1957, in add
    raise exc.UnmappedInstanceError(instance)
sqlalchemy.orm.exc.UnmappedInstanceError: Class '__main__.Person' is mapped, but this instance lacks instrumentation.  This occurs when the instance is created before sqlalchemy.orm.mapper(__main__.Person) was called.

Thanks in advance!

question

Source

HymanZHAN

Most helpful comment

Just to add since it isn't mentioned here:

Since v1.7 pydantic has supported automatically creating pydantic dataclasses for stdlib dataclasses, see #1817

samuelcolvin on 12 Dec 2020

👍2 🚀1

All 11 comments

Best to use ORM mode.

I have no personal interest in ORMs or supporting them - long term they're a mistake, you'd do much better to just write SQL.

However if there's some easy way to support mapper that required a relatively small change to pydantic, I'd be happy to review a PR.

samuelcolvin on 9 Dec 2019

Thank you for the quick response!
Wish I had the SQL skill level to say that lol. :man_facepalming: I have worked on a Django project where most database operations are written in raw sql and it's not a pleasant experience. I find it to be a great maintenance burden not only for me but for other newer team members.
I will take a look and see what I can do when I am more settled with my current project. I shall experiment with BaseModel and ORM Mode as well. Thanks again!

HymanZHAN on 9 Dec 2019

Django's ORM is kind of a special case, since:

it's integrated with the entire framework
it doesn't try to be a shallow layer on top of SQL, but rather starts from the bottom up with it's own data approach, see https://github.com/samuelcolvin/pydantic/issues/491#issuecomment-489316787

Good luck with from_orm that's the approach most people take.

samuelcolvin on 9 Dec 2019

From my personal experience, the major difference between SQLAlchemy and Django ORM is really the difference between Active Record and Data Mapper. (Or maybe I am just not experienced enough to spot/appreciate the others lol. :man_facepalming:)

I really disliked SQLAlchemy when I first started Python web dev in Flask & Flask-SQLAlchemy and came to like Django ORM quite a lot when my new job uses Django. But using Django ORM means a tight data coupling. With some background in .NET Core and EF Core and recently learning more about software architecture, I slowly come to appreciate many design choices SQLAlchemy makes. The syntax is not the best for sure, but many things just make more sense now.

Edit: SQLAlchemy Core also has some very useful tools for building queries, including helpers for using textual SQL.

HymanZHAN on 9 Dec 2019

🚀1 ❤1 👍1

i have closed my other issue (https://github.com/samuelcolvin/pydantic/issues/1403) and would like to add my request here.
This would be great if there was some way to support sqlalchemy mapper mode. This allows us to not double-define models for sqlalchemy and pydantic separately and will be a huge help in using fastapi

sandys on 17 Apr 2020

a good compromise would be if pydantic could take a python standard dataclass and convert it into a new dataclass made of pydantic dataclasses.
Right now, I'm doing this

from sqlalchemy import Table, Text, MetaData, Column, Integer, String, ForeignKey, create_engine
from sqlalchemy.orm import mapper, relationship, sessionmaker

from typing import Optional
from pydantic import  EmailStr
from pydantic.dataclasses import dataclass as py_dataclass
from dataclasses import dataclass

metadata = MetaData()

audit_mixin = Table('audit_mixin', metadata,
                     Column('id',Integer, primary_key=True),
                      Column('performed_by',Integer, nullable=True))

user = Table('user', metadata,
            Column('user_id', Integer, primary_key=True),
            Column('name', String(50)),
            Column('email', String(100)),
            Column('fullname', String(50)),
            Column('nickname', String(12))
        )


@dataclass
class AuditMixin:
    id: int
    performed_by: int 

@py_dataclass
class AuditMixinPy(AuditMixin):
    pass

@dataclass
class User(AuditMixin):
    user_id :int
    identity :str
    identity_type :str
    row_status :str
    comments :str

@py_dataclass
class UserPy(User):
    pass


engine = create_engine("sqlite:///", echo=False)
metadata.create_all(engine)

session = sessionmaker(engine)()

mapper(AuditMixin, audit_mixin)
mapper(User,user)

u = User(100,123,1, "a","dfd","dfdf","dfdd")
u.id
u.performed_by
session.add(u)
session.commit()

u = User("sss",123,1, "a","dfd","dfdf",1)
u = UserPy("sss",123,1, "a","dfd","dfdf",1)

sandys on 17 Apr 2020

Hi!

This is really interesting feature, and I would like to give it some thought. I was playing around with sqlalchemy mapper and pydantic, so the main problem was:

@no_type_check
def __setattr__(self, name, value):
        ...
        if not hasattr(self, '__fields_set__'):
            object.__setattr__(self, '__fields_set__', set())

This will cause some problems with __eq__, because __eq__ checks all fields, but sqlalchemy loads some private attributes. @samuelcolvin , what do you think about this approach? As I see no tests failed.

Yes, sqlalchemy users would have to rewrite __eq__ of their pydantic BaseModels, but I think it's much easier for developers.

denisSurkov on 21 Apr 2020

That's an interesting idea.
Could you illustrate with a complete example?

On Tue, 21 Apr, 2020, 11:38 Denis Surkov, notifications@github.com wrote:

Hi!

This is really interesting feature, and I would like to give it some
thought. I was playing around with sqlalchemy mapper and pydantic, so the
main problem was:

@no_type_check
def __setattr__(self, name, value):
...
if not hasattr(self, '__fields_set__'):
object.__setattr__(self, '__fields_set__', set())

This will cause some problems with eq, because eq checks all field,
but sqlalchemy loads some private attributes. @samuelcolvin
https://github.com/samuelcolvin , what do you think about this
approach? As I see no tests failed.

Yes, sqlalchemy users would have to rewrite eq of their pydantic
BaseModels, but I think it's much easier for developers.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/samuelcolvin/pydantic/issues/1089#issuecomment-616972869,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAASYU7YJP5V2FRZ7JZ7YOTRNUZ5LANCNFSM4JYCWC2A
.

sandys on 21 Apr 2020

Some dirty hacks, but I hope you understand the idea.

pydantic/main.py

class BaseModel(...):
    ...

    @no_type_check
    def __setattr__(self, name, value):
        if self.__config__.extra is not Extra.allow and name not in self.__fields__:
            raise ValueError(f'"{self.__class__.__name__}" object has no field "{name}"')
        elif not self.__config__.allow_mutation:
            raise TypeError(f'"{self.__class__.__name__}" is immutable and does not support item assignment')
        elif self.__config__.validate_assignment:
            known_field = self.__fields__.get(name, None)
            if known_field:
                value, error_ = known_field.validate(value, self.dict(exclude={name}), loc=name, cls=self.__class__)
                if error_:
                    raise ValidationError([error_], self.__class__)
        self.__dict__[name] = value
        # this lines added 
        if not hasattr(self, '__fields_set__'):
            object.__setattr__(self, '__fields_set__', set())
        # end 
        self.__fields_set__.add(name)

   ...

metadata = MetaData()

agents = Table(
    'agents', metadata,
    Column('someid', Integer, primary_key=True, index=True),
    Column('raw_json', JSON)
)


def start_mappers():
    mapper(agent_models.Agent, agents)

class Agent(BaseModel):
    someid: int = Field(...)
    raw_json: t.Optional[t.Dict[str, t.Any]] = Field(None)

    def __eq__(self, other):
        if isinstance(other, Agent):
            return self.someid == other.someid
        else:
            return False

    class Config:
        extra = Extra.allow
        allow_mutation = True
        orm_mode = True

def test_agent_mapper_can_load_lines(session):
    some_dict = {
        "key": "value"
    }

    raw_json = json.dumps(some_dict)

    session.execute(
        'INSERT INTO agents (someid, raw_json) VALUES '
        '(1, NULL ),'
        f'(2, :raw_json)', dict(raw_json=raw_json, )
    )
    session.commit()

    expected = [
        models.Agent(someid=1, raw_json={}),
        models.Agent(someid=2, raw_json=some_dict)
    ]

    assert session.query(models.Agent).all() == expected

denisSurkov on 21 Apr 2020

It is indeed quite an interesting use-case to use @dataclass with classical mapping..
I'm using pydantic for request validation (this dataclass instance is sent around in the application). But, I also wanted to persist values in this pydantic instance. So, here's my solution/hack to implement this transparently.

Model definitions:

from pydantic.dataclasses import dataclass as pydantic_dataclass
from dataclasses import dataclass

@pydantic_dataclass
class PydanticRequest:
    test: confloat(ge=10, le=100)


@dataclass  # This then takes the annotations and builds a new dataclass ;)
@duplicates_pydantic_dataclass(PydanticRequest)  # This prepares __annotations__ dynamically
class Request:
    pass

And the decorator which does the job

def duplicates_pydantic_dataclass(pydantic_cls):
    def wrapper(cls):
        cls.__annotations__ = {}
        for field_key, field_value in pydantic_cls.__dataclass_fields__.items():
            cls.__annotations__[field_key] = field_value.type
        return cls
    return wrapper

This way I transparently use pydantic for validations

pr = PydanticRequest(test=11.0)

sqlalchemy mappers are defined for Request !

For CRUD operation, I have the following:

from dataclasses import asdict
def instantiate_with_data(cls, instance):
    return cls(**asdict(instance))

>>> pr = PydanticRequest(test=11.0)
>>> r = instantiate_with_data(Request, pr)
>>> session.add(r)

Although, it suits alright for _simple_ objects and mappings