I'm currently trying to implement a new Field to abstract the storage of a complex object. The goal is to continue using this object in the python code, and have it stored correctly in Elasticsearch:
class ComplexField(Field):
def to_dict(self):
d = super().to_dict()
return {
# my representation
}
def to_python(self, data):
return Complex(*data)
class MyDoctype(DocType):
complex = ComplexField()
complex = Complex()
md = MyDoctype()
md.complex = complex
The problem is that I never see to_dict called when the object is saved, but to_python is called during the initialization.
How can I solve this ?
thank you.
Currently only deserialization is supported - the to_dict method is used to have the definition of the field (the part that goes in the mappings). The solution is to "teach" the serializer how to serialize your Complex data type directly, so providing a custom serializer instead of the one we provide (elasticsearch_dsl.serializer.AttrJSONSerializer) and passing it to the client when setting up the connections.
Does that make sense?
Thanks!
Hello @HonzaKral thank you for your confirmation.
I'd rather prefer to keep the serialization at the field level. The to_dict method was clearly what I was looking for, so as Django's implementation. I don't think the serializer should know how to serialize business objects.
We have the same implematation solution on django-rest-framework : http://www.django-rest-framework.org/api-guide/fields/#custom-fields
Do you think it's a smart future evolution of DocTypes ?
Yes, this definitely makes sense; and I completely agree that it shouldn't have to be solved in the serializer. I just need to devise the best way to do it.
Do you think the to_dict and to_python naming makes sense for this use case? I can change the to_dict to get_definition (like analysis uses) and thus free the name. We can then use the same trick we use to determine if a field needs special deserialization (_coerce) to drive the need for serialization so we don't have to pay the price for all fields unnecessarily.
I think it's a first step. to_dict , to_python, get_definition are clearly explicits.
For the serialisation of all fields, I think that clarity is better than a few optimisation, am I right ?
I'm trying to implement AttachmentField, and ran in to this same thing. Was pretty confused to find that when I set attributes on an instance of DocType, it does to_python on them, even though the value I'm passing in _is_ the Python representation.
Just pushed to master commits f404c8b and 4d2bfd9
I decided to go with the name (de)serialize for the methods, there is an example in the tests for a custom field (https://github.com/elastic/elasticsearch-dsl-py/blob/master/test_elasticsearch_dsl/test_document.py#L42-L59) showing how it would work. I still intend to create a CustomField helper to make it easier (avoid having the user reimplement the to_dict method).
I am not 100% sure on the naming, but the functionality seems to work just fine. I would be happy for any feedback.
Thanks
CustomField added in a5b614d2ab7c6a2fdd49e9d01a0204bd2fd0c565, now only to add documentation...
This has been just released in 2.0.0
Thank you @HonzaKral for this !