Note: Cross-posting from http://discuss.elastic.co. Not sure where is better for elasticsearch-dsl questions.
I'm attempting to persist an object to elasticsearch with a nested object. However instead of using syntax like:
child = field.Object(properties={'name': field.String()})
I'd like to have child be it's own object.
from elasticsearch_dsl import field, document
from elasticsearch_dsl import connections
connections.connections.create_connection(hosts=['localhost'], timeout=20)
class Child(field.Object):
name = field.String()
class Parent(document.DocType):
name = field.String()
child = Child()
class Meta:
index = 'myindex'
parent = Parent(name='test parent')
parent.child = Child(name='test other child')
parent.save()
However, doing it this way results in:
File "/home/abarilla/PycharmProjects/iocdb/iocdb/model/test.py", line 25, in <module>
parent.child = Child(name='test other child')
File "/home/abarilla/.virtualenvs/iocdb/local/lib/python2.7/site-packages/elasticsearch_dsl/document.py", line 115, in __setattr__
return super(DocType, self).__setattr__(name, value)
File "/home/abarilla/.virtualenvs/iocdb/local/lib/python2.7/site-packages/elasticsearch_dsl/utils.py", line 418, in __setattr__
value = self._doc_type.mapping[name].to_python(value)
File "/home/abarilla/.virtualenvs/iocdb/local/lib/python2.7/site-packages/elasticsearch_dsl/field.py", line 63, in to_python
return self._to_python(data)
File "/home/abarilla/.virtualenvs/iocdb/local/lib/python2.7/site-packages/elasticsearch_dsl/field.py", line 133, in _to_python
return self._doc_class(self.properties, **data)
TypeError: type object argument after ** must be a mapping, not Child
Is there a way I can accomplish this?
Have you tried using the Nested field object?
from elasticsearch_dsl import DocType, Nested, String
class Parent(DocType):
name = String()
child = Nested(properties={name: String()})
See documentation here: https://github.com/elastic/elasticsearch-dsl-py/blob/master/docs/persistence.rst
Yes. I tried it with Nested and DocType as well:
class Child(field.Nested):
name = field.String()
I need to keep the nested objects as plain old objects for backwards compatibility. I can write code that maps between the objects and the nested properties but would prefer not to.
Hmm, you have me confused about what you want and do not want... In Python, everything is an "object", and I'm not sure what a "plain old object" is.
Do you mean that you need to have Parent and Child as two separate document types?
And, are your Parent and Child documents exactly the same type, just that they have a parent-child relationship? In that case you should probably look into at this page: https://www.elastic.co/blog/managing-relations-inside-elasticsearch/
With more information, it might be easier to help...
Oh, and if you need Child to represent an individual document type, you need to make it derive from DocType, not a field class.
class Child(DocType):
name = String()
There are two different things at play here - the definition of the Nested mappings, which can currently be done, although not through the declarative syntax:
user_field = Nested(
properties={
"display_name": String(fields={'raw': String(index='not_analyzed')})
}
)
class BlogPost(DocType):
author = user_fielc
If I understand correctly you'd also want to wrap all instances of author into your own python class. To do that you have to create a custom subclass and speify the _doc_class property (you should use a subclass of InnerObjectWrapper:
class Author(Nested):
_doc_class = MyAuthorClass
user_field = Author(
properties={
"display_name": String(fields={'raw': String(index='not_analyzed')})
}
)
class BlogPost(DocType):
author = user_fielc
I am certainly open to making this easier, do you have any ideas how you'd like the API to look?
Thanks!
The use case which I'm running across is that there are numerous scripts being handled by celery which insert into ES using some basic objects; such as Parent and Child however Child is not a standalone doctype.
{
"_index": "myindex",
"_type": "parent",
"_id": "v3Ytc2EaSUGz92QDYfRYNA",
"_score": 1,
"_source": {
"name": "test parent",
"child": {
"name": "2nd test child"
}
}
}
Ideally, a user could create an object like this:
child = Child(name='test other child')
parent = Parent(name='test parent', child=child)
parent.save()
I haven't been using ES long enough to suggest what the appropriate syntax would be for implementing it. However, it seems like an issue that would come up if more integration of elasticsearch-dsl into Django would come up. Objects that are doc types vs objects that are just heirarchical data for a doc type. That's why I was thinking of something along these lines.
class Child(field.Object):
name = field.String()
class Parent(document.DocType):
name = field.String()
child = Child()
class Meta:
index = 'myindex'
Like you just posted above I'm sure these more to it then this of course.
@HonzaKral Can the _doc_class be directly InnerObjectWrapper?
@andybarilla It seems to me that Honza's example is what you're looking for...
Transposing it to your example:
class Child(Nested):
_doc_class = InnerObjectWrapper # That is, if this is possible...
child = Child(
properties={
"name": String()
}
)
class Parent(DocType):
name = String()
child = child
What do you think?
@njoannin it is InnerObjectWrapper by default so if you want to use that, no need to subclass.
OK, cool: that makes it easy.
I have changed this in 7637cf2 to allow passing of the doc_class wrapper as __init__ kwargs without the need for subclassing. Example can be found in the tests - https://github.com/elastic/elasticsearch-dsl-py/blob/master/test_elasticsearch_dsl/test_validation.py#L15
I wasn't sure if this warranted a new issue, but along these same lines, it seems like currently you need to define your classes like this:
class OrderItem(InnerObjectWrapper):
pass
class Order(DocType):
items = Object(
doc_class=OrderItem,
multi=True,
properties={
'description': Text(),
'quantity': Integer(),
'price': Float()
}
)
You don't actually need the OrderItem class if you aren't going to customize it. You could just set doc_type=InnerObjectWrapper, but I included it this way so that the examples are consistent.
Where I'd prefer to be able to define them more like this:
class OrderItem(InnerObjectWrapper):
description = Text()
quantity = Integer()
price = Float()
class Order(DocType):
items = Object(
doc_class=OrderItem,
multi=True
)
In this way the properties definitions for inner objects is consistent with that of the top level. It also allows you to keep your field definitions with the object it belongs to, reducing coupling and allowing for inner objects that could be reusable without having to duplicate the properties definition in every using class.
For example, say you have a CMS with an index for each type of content like articles, slideshows, and quizzes, etc. But each of those documents may want to have an author sub object. If the properties are set where they naturally belong (on the author class) then you don't need do duplicate that code to all of the different content type doc classes.
class Author(InnerObjectWrapper):
name = Text()
image_url = Text()
bio_url = Text()
Is there maybe a way to do this that I'm missing or is it on the roadmap or do I just need to use properties for inner objects?
I agree with @JoshCoady . The current way to define Nested objects is quite inconsistent and counter-intuitive. It also prevents reusability of nested objects. I think there should be virtually no difference between nested objects and inner objects from the ORM perspective, apart from the way you declare them:
class OrderItem(InnerObject):
description = Text()
quantity = Integer()
price = Float()
class Order(DocType):
items = Nested(doc_class=OrderItem)
# or
items = OrderItem(multi=True)
Let's say I want to denormalize the last item, I'd like to be able to do something like this:
class Order(DocType):
items = Nested(doc_class=OrderItem)
last_item = OrderItem()
Currently, I have to duplicate the OrderItem object to be able to do this, and this is very ugly. Or I missed something?
I'm with @JoshCoady, I naturally want to use the objects like this but it does not seem to work. Is there anyway to inherit the mapping of another object currently?
To followup, it's a bit kldugy, but I've gotten partially there by doing something like this:
class OrderItem(Object):
def __init__(self, *args, **kwargs):
kwargs.setdefault('doc_class', InnerObjectWrapper)
kwargs.setdefault('properties', {}).update(
description=Text(),
quantity=Integer(),
price=Float()
)
super().__init__(*args, **kwargs)
class Order(DocType):
items = OrderItem(multi=True)
Defining OrderItem is a bit messy, but it is now re-useable. You can use the same method to inherit another object:
class DatedOrderItem(OrderItem):
def __init__(self, *args, **kwargs):
kwargs.setdefault('properties', {}).update(
ship_date=Date()
)
super().__init__(*args, **kwargs)
Most helpful comment
I wasn't sure if this warranted a new issue, but along these same lines, it seems like currently you need to define your classes like this:
You don't actually need the
OrderItemclass if you aren't going to customize it. You could just setdoc_type=InnerObjectWrapper, but I included it this way so that the examples are consistent.Where I'd prefer to be able to define them more like this:
In this way the properties definitions for inner objects is consistent with that of the top level. It also allows you to keep your field definitions with the object it belongs to, reducing coupling and allowing for inner objects that could be reusable without having to duplicate the properties definition in every using class.
For example, say you have a CMS with an index for each type of content like articles, slideshows, and quizzes, etc. But each of those documents may want to have an author sub object. If the properties are set where they naturally belong (on the author class) then you don't need do duplicate that code to all of the different content type doc classes.
Is there maybe a way to do this that I'm missing or is it on the roadmap or do I just need to use
propertiesfor inner objects?