I have a use case that I'd to add an attribute when initialising the instance which is not part of the model, thus should not be validated. Is that possible?
Here's a practical example:
from pydantic import BaseModel
from datetime import datetime
class Test(BaseModel):
a: int
class TestExtra(BaseModel):
a: int
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._processed_at = datetime.utcnow()
test = {"a": 1}
Test(**test)
TestExtra(**test) # ValueError: "TestExtra" object has no field "_processed_at"
Yes, use underscore or ClassVar.
This works, but it's not what I want, as every instance will have the same timestamp:
class TestExtra(BaseModel):
a: int
_procesed_at: datetime.utcnow()
This is what I want, but it fails with the same error:
class TestExtra(BaseModel):
a: int
_processed_at: datetime
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._processed_at = datetime.utcnow()
TestExtra(a=1)
Or am I missing something?
@samuelcolvin can this be reopened and solved please?
This should work:
from datetime import datetime
from pydantic import BaseModel
class Test(BaseModel):
a: int
class TestExtra(BaseModel):
a: int
def __init__(self, **kwargs):
super().__init__(**kwargs)
object.__setattr__(self, '_processed_at', datetime.utcnow())
test = {"a": 1}
t1 = Test(**test)
debug(t1)
t2 = TestExtra(**test)
debug(t2)
debug(t2._processed_at)
_processed_at gets added to the model, so it will be included in .dict() etc. I'm not sure whether that's what you want?
_(updated with a better solution)_
If not, you can do something like:
from datetime import datetime
from pydantic import BaseModel
class Test(BaseModel):
a: int
class TestExtra(BaseModel):
__slots__ = ('processed_at',)
a: int
def __init__(self, **kwargs):
super().__init__(**kwargs)
object.__setattr__(self, 'processed_at', datetime.utcnow())
t2 = TestExtra(a=123)
debug(t2.dict())
debug(t2.processed_at)
outputs:
➤ python test.py
test.py:20 <module>
t2.dict(): {'a': 123} (dict) len=1
test.py:21 <module>
t2.processed_at: datetime.datetime(2020, 1, 2, 19, 23, 33, 36215) (datetime)
Thanks for the fast answer,
Indeed, private processed_at should not be included in .dict(), so the second solution you shared works fine. However it is painful (and hacky) to use __slots__ and object.__setattr__, is there a limitation that cannot be overcome in the current implementation to have the following - natural behavior:
class TestExtra(BaseModel):
a: int
# Starts with underscore, private property
# Won't be shown in .dict()
_processed_at: datetime = None
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Some computation here
self._processed_at = datetime.utcnow() # Won't throw ValueError
EDIT:
This comes from the fact that private properties are likely to be used for internal data-related operations: let's consider the following use-case:
I have a Video(BaseModel) class representing a video, that can be instantiated with a path property. I want this class to handle read operation on the video (to later extract frames from the video). I don't want to expose intermediate properties in the model, but I want to store in this instance a reference to the in-memory video object (using CV2.VideoCapture).
In that case, I would like something like that working:
class Video(BaseModel):
path: Path
_videoCapture: Any = None
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._videoCapture = cv.VideoCapture(self.path)
is there a limitation that cannot be overcome in the current implementation to have the following - natural behavior:
Yes, the way we use __dict__.
Using slots is the only solution I know of, but the code is open source, maybe you can find another solution?
So after digging a bit into both code and doc, I tried with extra=ignore:
class TestExtra(BaseModel):
a: int
_private_key: bool = True
class Config:
extra = "ignore" # Default value
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Some computation here
self._processed_at = datetime.utcnow() # Won't throw ValueError
Then I bypassed the check in the code using:
@no_type_check
def __setattr__(self, name, value):
if self.__config__.extra is Extra.forbid and name not in self.__fields__:
instead of
@no_type_check
def __setattr__(self, name, value):
if self.__config__.extra is not Extra.allow and name not in self.__fields__:
Output:
{'a': 123, '_processed_at': datetime.datetime(2020, 1, 3, 12, 31, 20, 317079)}
Note that the output is still not what I want: _processed_at should be private and not included.
The doc is not very clear about this extra parameter behavior:
extra :
whether to ignore, allow, or forbid extra attributes during model initialization. Accepts the string values of 'ignore', 'allow', or 'forbid', or values of the Extra enum (default: Extra.ignore)
What is the scope of model initialization here?
I am now looking for a exclude_private: bool = True in the .dict() method
What is the scope of model initialization here?
extra will apply both when creating models and when setting attributes, you might also be able to use validate_assignment=False to avoid the assignment checks.
exclude_private: bool = True this doesn't exist, but you can use exclude={'_processed_at'} and even overwrite dict() on a custom base model to set it by default.
exclude_private: bool = True this doesn't exist, but you can use exclude={'_processed_at'} and even overwrite dict() on a custom base model to set it by default.
Please check the PR above, where I implemented this parameter. WDYT ?
I don't like the idea of having anything other than fields in __dict__: it will force additional checks and uncertainty of what you have as parsed data inside your model and what you don't.
If we would consider any underscore attrs non-fields, what should we do in this case?
from pydantic import BaseModel, Extra
class Model(BaseModel):
class Config:
extra = Extra.allow
print(Model(**{'_a': 'b'})) # _a='b'
I think that the best way to keep instance attributes away from fields is to have them in __slots__, as proposed in https://github.com/samuelcolvin/pydantic/issues/655#issuecomment-570312649
Not only it will be impossible to mix these parameters, but you also would have 30% faster access speed and declared attributes in one place.
I mostly agree with @MrMrRobat I only grudgingly accepted a solution proposed in #1139 because many people asked for something like this.
Perhaps we could have another work around that doesn't require the slots, object.__setattr__(self, ... solution I provided above. We could have a (perhaps optional) attribute of a model which would allow setting and getting values without getting in the way of __dict__.
Something like:
from datetime import datetime
from pydantic import BaseModel
class InternalNoOp:
pass
class TestExtra(BaseModel):
__slots__ = ('internal',)
a: int
def __init__(self, **kwargs):
super().__init__(**kwargs)
object.__setattr__(self, 'internal', InternalNoOp())
self.internal.processed_at = datetime.now()
t2 = TestExtra(a=123)
debug(t2.dict())
debug(t2.internal.processed_at)
Perhaps we could find a way to hide the setup of internal.
Is there some way perhaps to use a type decorator of some fashion to annotate internal fields? i.e.
class Foo(BaseModel):
a: Internal[Optional[int]]
Idea being that such internal fields will be ignored on import.
When extra fields are disallowed, this field being present actually raises an error.
When serializing, this field is always omitted.
Why the type annotation then? Because 100% of my work is under mypy strict, and internal fields still need type.
I also didn't stipulate that it begins with an underscore: It's fully valid for actual JSON models to use those, so I don't think it's appropriate to enforce namespacing like that.
Of course, if the field is "private" by Python convention, it should be available to use, too.
@samuelcolvin I'm willing to try to prototype some kind of solution, can you point me to what the limitation with __dict__ is and why the __slots__ workaround is needed?
(Filenames and line numbers appreciated if you can provide them.)
I'm not convinced by Internal, I want to avoid (mis)using types for things which don't directly relate to the type of the object (although I know we already do this a little). I understand it looks sensible to us, but if you're new to pydantic it could be very confusing.
can you point me to what the limitation with
__dict__is and why the__slots__workaround is needed?
__dict__ isn't a pydantic thing, it's part of python see #712 and related issues for background.
Having thought about this a bit I think the best solution might be #660 which would prevent the field being included in .dict() or .json().
That would allow a property of a model which is a field but is never included in in serialised models.
For setting the field, I'm still thinking about computed fields (#935) but they are definitely needed and will fix this.
This would completely avoid the need for attributes of a model which are not fields. Until then the best work around is object.__setattr__(self, '_processed_at', datetime.utcnow()) as described above.
I'm not sure that semantic type information is directly an abuse of types, but avoiding their use to limit complexity is still legitimate.
I'm not convinced it would be /less/ confusing to users than = Field(..., include=False), because the annotation is quite tidy. I will admit that as soon as two want two or more semantic annotations, all claims of tidiness are lost; especially if you use them in different orders, in a nested way, etc.
Whatever the solution, the subject could use a treatment in a heading/section of the docs. A decorator like Internal[] could be explained sufficiently well there with little harm or confusion to more basic uses of the framework.
(Anecdote: I am extremely new to pydantic: I've been trying to prototype with it for about a week. My personal gut reaction is that type annotations in general make good sense, and presumed magic in the base class can be guessed at or reasoned about in a fairly intuitive way. However, I understand little about how type annotations work in conjunction with things like Field(...) in a way that doesn't confuse mypy, and consider that sufficiently more magic than custom type annotations. Maybe that's just me.)
Well, whatever happens, I'll read up on the issues you referenced and see if I can't find a meaningful way to contribute to them, as I'd be very keen in having unlisted/private/internal/excluded/whatever-you-want-to-call-it fields as a first-class feature.
(If anyone else wants to dig in, please don't wait for me, but do feel free to tag me on any RFCs if you want testing or comments.)
Thanks for this library!
--js
Maybe a good solution would be not to uses __dict__ directly but add a pydantic private dict, e.g. __fields__ for the fields. Like this, pydantic controlled attributes and other attributes would be clearly separated.
Maybe a good solution would be not to uses
__dict__directly but add a pydantic private dict, e.g.__fields__for the fields. Like this, pydantic controlled attributes and other attributes would be clearly separated.
I guess, this will require having __getattr__ on BaseModel, and while having option like using __slots__ for all service attributes, why this slooooow method should be used?
For instance, removal of __getattr__ in https://github.com/samuelcolvin/pydantic/issues/711 significantly increased attr access and fixed issues with PyCharm introspection.
@MaxNoe please let me know if https://github.com/samuelcolvin/pydantic/issues/655#issuecomment-597777384 would solve your problem. If not, please let us know why.
So we are currently evaluating pydantic for configuration. But in general, our classes also have attributes not related to configuration. Having to use object.__setattr__ everywhere we use these attributes would be a pretty big show stopper I think.
Or am I mistaken about what you imply the solution is in that comment?
@MaxNoe, you can have your own model __setattr__ to allow m.a = b syntax if that's what you're looking for:
class MyModel(BaseModel)
def __setattr__(self, attr, value):
if attr in self.__slots__:
object.__setattr__(self, attr, value)
else:
super().__setattr__(self, attr, value)
Jumping back after a few experiments, here are some thoughts on a higher level:
manager class to manipulate these pydantic objects (thus removing the private attributes and avoiding the current issue.). This is a good step towards a proper MVC pattern in your projects.class User(BaseModel):
id: int
name = 'John Doe'
signup_ts: datetime = None
friends: List[int] = []
_is_registered: bool = False # private
...
Here the private attribute _is_registered introduce the possibility to add the logic of verifying if the user is registered (potentially call to database, etc) inside the User object. Rather a UserManager class should be introduced:
class UserManager:
@classmethod
def register_user(cls, user: User):
# Handle here different cases, whether user can be/is already registered, etc.
# other functions to manipulate User objects
@H4dr1en we are evaluating pydantic for a completely different use case.
From looking at what it does, we thought it would be great to validate and recursively handle configuration.
And in this case, we want to be able to support configurable attributes and non-configurable attributes.
We are also not expecting a lot of objects, but reading configuration files and command line options, validating them with pydantic and then doing our thing.
If this is not the scope of pydantic, we'll have to look for something else, but so far it seemed to work really nice as long as now non-configurable attributes are around.
@MrMrRobat Ok, so additionally I added a metaclass to inherit the __slots__ and this seems to work so far:
from pydantic import BaseModel
from pydantic.main import ModelMetaclass
import logging
logging.basicConfig(level=logging.INFO)
class InheritSlots(ModelMetaclass):
def __new__(cls, name, bases, namespace):
slots = set(namespace.pop('__slots__', tuple()))
for base in bases:
if hasattr(base, '__slots__'):
slots.update(base.__slots__)
if '__dict__' in slots:
slots.remove('__dict__')
namespace['__slots__'] = tuple(slots)
return ModelMetaclass.__new__(cls, name, bases, namespace)
class Configurable(BaseModel, metaclass=InheritSlots):
__slots__ = ('log', )
def __setattr__(self, attr, value):
if attr in self.__slots__:
object.__setattr__(self, attr, value)
else:
super().__setattr__(attr, value)
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.log = logging.getLogger(
self.__class__.__module__ + '.' + self.__class__.__qualname__
)
class Foo(Configurable):
__slots__ = ('test_foo', )
a: int
class Bar(Configurable):
__slots__ = ('test_bar', )
foo: Foo
foo = Foo(a=10)
foo.log.info('Foo')
foo.test_foo = 'test'
bar = Bar(foo={'a': 10})
bar.foo.test_foo = 'test2'
bar.test_bar = 'test3'
Jumping back after a few experiments, here are some thoughts on a higher level:
- Allowing/Having/Supporting private attributes actually allows the temptation of adding more logic on pydantic objects than necessary, probably (please correct me if I am wrong) moving away from the original scope of pydantic to the God object anti-pattern.
- The kind of problem described in this issue is often a consequence of having too many private attributes in the BaseModel class.
- Users should rather use a separate
managerclass to manipulate these pydantic objects (thus removing the private attributes and avoiding the current issue.). This is a good step towards a proper MVC pattern in your projects.
In your example, you are right: you want a manager class. I don't think this is the ONLY reason to want private/non-serialized instance variables, though.
I want to add a computed lookup table that's in a more useful form internally to python than the form it comes in as over the wire. I don't control the API format, and every language is different, so changing the API is not an option.
For example, it's a common pattern to send a list of records:
[ {"id": 0, "record": "Hello"}, {"id": 1, "record": "World!"} ]
In Pydantic, maybe we'd write:
class Record(BaseModel):
id: int
record: str
class Query(BaseModel):
record_list: List[Record]
We may wish to compute a hash table instead so that the records can be traversed once and retrieved in the future. The class might be extended with some private cache:
class Query(BaseModel):
_cache: Dict[int, str]
def _post_load(self):
for record in self.record_list:
self._cache[record.id] = record
def get_record(self, id: int):
return self._cache[id]
I wouldn't call this management; it describes a transformation and nothing else.
In the general case, I don't think you can suggest enveloping this object to add state and behavior, because for things like cache, you'll likely wind up manually reflecting all of the fields you aren't intended to transform.
Imagine that the original Pydantic object in question has 20 fields and we want to apply a data transformation to just one of the fields; what happens to the other 19 fields? I'm worried that there's a lot of needless repetition involved there. In general, I wonder how to extend the functionality of a Pydantic object except through "has-a" relationships, which are the right choice for a manager, but the wrong choice for extending functionality.
Let's say we actually ARE adding management code, but what we are managing directly involves the data being manipulated. Let's say we want to add a _dirty field that lets us know the object has been modified and needs to be flushed to disk, written out to a db, etc.
You could wrap all possible reads/writes in the manager, but again this runs into problems where you have to reflect all of the fields back up into the manager and you wind up with a lot of duplicated code that has to be updated if the model changes.
Or if we had a _dirty state that was excluded from the model, you could track this directly in __setattr__ and not have to create another class.
I think there are fairly legitimate reasons to want state that's considered separate from the canonical model. Using __slots__ and object.__setattr__ seems to work, though not in a way that preserves type hints for mypy (That I have seen, yet?) and it seems fairly non-obvious to use. I am assuming there isn't room to mention this trick in the docs, at least, so it remains a bit of a guru hack.
I saw samuelcolvin mention in https://github.com/samuelcolvin/pydantic/issues/655#issuecomment-597777384 that creating an "excluded fields" configuration might be an option, and that'd work just fine, probably. I imagine once we reached that point it wouldn't be a far throw to create a decorator that made it easy to annotate excluded fields.
(I'm willing to help, it just seemed like maybe there wasn't a lot of clarity on if this feature was truly needed or even wanted, so I wanted to get that squared away first ...)
Maybe I'm not understanding some other workarounds or why this isn't really a problem, or not a valid thing to want to do with pydantic objects, though.
@jnsnow, seems fair enough. I have a couple ideas about how it can be implemented. Will be happy to make a PR in the near future.
I have a set of classes that require a non-pedantic managed variable. I'm using __slots__ to declare it as recommended above. However, I really want to add a type hint for the variable in __slots__, and am not sure how to do it. I've tried every technique listed in the selected answer on this Stackoverflow question, but can't find anything that works. Is there a recommended technique for doing this?
If there is not a good way of doing this, one possible design solution might be:
class Device(BaseModel):
id: str
name: str
__slots__ = ["token"]
token: str
telling pydantic to ignore variables that have been included in __slots__. Currently, the code above will not work; pydantic will process token as if it were not in __slots__.
It is possible that the pull request by MrMrRobat will allow this code to work (perhaps by renaming token to _token).
Update: I got this to work. Functional, but ugly.
class Device(BaseModel):
id: str
name: str
__slots__ = ["token"]
def dummy(self, token: str):
self.token = token
@eykamp You can try:
from __future__ import annotations
from typing import TYPE_CHECKING
class Device(BaseModel):
id: str
name: str
if TYPE_CHECKING:
token: str = ""
else:
__slots__ = ["token"]
@eykamp, @nicolas-geniteau I see this as a simpler and cleaner way:
```py
class Device(BaseModel):
id: str
name: str
_token: str
__slots__ = ["_token"]
For some reasons, I thought that slots were not inherited, but the mix of your snipper @MrMrRobat and @MaxNoe setter is working for my use case.
from typing import Optional
from pydantic import BaseModel
import requests
import logging
import abc
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
class BaseModelWithSlotSetter(BaseModel):
def __setattr__(self, attr, value) -> None:
if attr in self.__slots__:
object.__setattr__(self, attr, value)
else:
super().__setattr__(attr, value)
def _init_slots(self) -> None:
pass
def __init__(self, **kwargs) -> None:
super().__init__(**kwargs)
self._init_slots()
class Sensor(BaseModelWithSlotSetter, abc.ABC):
id: str
name: str
_token: Optional[int] # I don't think this is needed, but good for IDE autocomplete
__slots__ = ["_token"]
def _init_slots(self):
# We can't set the default at the class definition level because of __slots__ conflicts with class variable
self._token = None
def send_to_backend(self):
if not self._token:
raise Exception("Need a token")
res = requests.post(f"https://reqbin.com/echo/post/json?token={self._token}", self.json())
return res.json()
def set_token(self, token: int):
self._token = token
class TemperatureDevice(Sensor):
temperature: float
a = TemperatureDevice(id="t1234", name="Temp Device 1234", temperature=28.4)
# No type validation here because we are bypassing Pydantic for the slot setter but Pycharm/Mypy are not happy
a.set_token("asdasddasdasdas")
logging.info(a.send_to_backend())
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): reqbin.com:443
DEBUG:urllib3.connectionpool:https://reqbin.com:443 "POST /echo/post/json?token=asdasddasdasdas HTTP/1.1" 200 19
INFO:root:{'success': 'true'}
The workaround with __slots__ works great, but when I copy the object, I cannot access the attribute for the copy. Does anyone know why?
from pydantic import BaseModel
class Foo(BaseModel):
__slots__ = ("_b",)
a: int
def __init__(self, **kwargs):
super().__init__(**kwargs)
object.__setattr__(self, "_b", 42)
foo = Foo.parse_obj({"a": "123"})
print(foo._b) # Prints 42
foo_copy = foo.copy()
print(foo_copy._b) # Raises "AttributeError: _b
If I do the exact same thing with a class that doesn't inherit from BaseModel, it works, so it seems like pydantic is preventing __slots__ to be copied in some way?
Does it work if you use the more generic python builtin copy?
from copy import copy # or even deepcopy if you want to copy also all members
@MaxNoe: I have tested with copy.copy, copy.deepcopy and BaseModel.copy(deep=True), and they all give the same error.
repl.it here: https://repl.it/@FilipLange/TraumaticCoordinatedChemistry
Could we reopen this issue?
I believe we should be able to use Pydantic simply to enrich a business model with more precise definitions.
For instance, instead of:
# v1
from datetime import datetime
class Device:
serial_number: str
_creation_date: datetime
def __init__(self, serial_number: str) -> None:
self.serial_number = serial_number
self._creation_date = datetime.now()
def get_age_in_seconds(self) -> float:
return (datetime.now() - self._creation_date).total_seconds()
we could have:
# v2
import pydantic
from datetime import datetime
class Device(pydantic.BaseModel):
serial_number: str = pydantic.Field(regex=r'^[0-9a-f]{16}$')
_creation_date: datetime
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
self._creation_date = datetime.now()
def get_age_in_seconds(self) -> float:
return (datetime.now() - self._creation_date).total_seconds()
without putting all the validation logic into the constructor or a factory.
It seems like a simple need, this workaround works but is very unnatural:
# v3
import pydantic
from datetime import datetime
class Device(pydantic.BaseModel):
serial_number: str = pydantic.Field(regex=r'^[0-9a-f]{16}$')
_creation_date: datetime
__slots__ = ('_creation_date',)
def __init__(self, *args, **kwargs) -> None:
super().__init__(*args, **kwargs)
object.__setattr__(self, '_creation_date', datetime.now())
def get_age_in_seconds(self) -> float:
return (datetime.now() - self._creation_date).total_seconds()
"""
>>> d = Device(serial_number='0123456789abcdef')
>>> d.get_age_in_seconds()
1.4e-05
>>> d.dict()
{'serial_number': '0123456789abcdef'}
"""
Does anybody see how we could have the functionality of v3 with a syntax more close to v2?
Hi @alexpirine
Have you looked at the PR and the doc associated to this issue?
Right now the best way to do what you want seems clean
from datetime import datetime
from pydantic import BaseModel, Field, PrivateAttr
class Device(BaseModel):
serial_number: str = Field(regex=r'^[0-9a-f]{16}$')
_creation_date: datetime = PrivateAttr(default_factory=datetime.now)
def get_age_in_seconds(self) -> float:
return (datetime.now() - self._creation_date).total_seconds()
d = Device(serial_number='0123456789abcdef')
print(d.get_age_in_seconds()) # 1e-05
print(d.dict() # {'serial_number': '0123456789abcdef'}
Hi @PrettyWood
Excellent, thank you! Indeed, it seems like the PrivateAttr field is an excellent answer to different use cases discussed here.
P.S. And actually, I saw the documentation about PrivateAttr 2 minutes after posting my comment. And I thought I commented here about it, but for some reason my comment didn't go through. Anyway, I'm happy there finally is a nice solution.
Most helpful comment
This works, but it's not what I want, as every instance will have the same timestamp:
This is what I want, but it fails with the same error:
Or am I missing something?