Elasticsearch-dsl-py: Persisting Nested Objects

Created on 13 May 2015  路  14Comments  路  Source: elastic/elasticsearch-dsl-py

Note: Cross-posting from http://discuss.elastic.co. Not sure where is better for elasticsearch-dsl questions.

I'm attempting to persist an object to elasticsearch with a nested object. However instead of using syntax like:

child = field.Object(properties={'name': field.String()})

I'd like to have child be it's own object.

from elasticsearch_dsl import field, document
from elasticsearch_dsl import connections
connections.connections.create_connection(hosts=['localhost'], timeout=20)


class Child(field.Object):
    name = field.String()


class Parent(document.DocType):
    name = field.String()
    child = Child()

    class Meta:
        index = 'myindex'

parent = Parent(name='test parent')
parent.child = Child(name='test other child')
parent.save()

However, doing it this way results in:

  File "/home/abarilla/PycharmProjects/iocdb/iocdb/model/test.py", line 25, in <module>
    parent.child = Child(name='test other child')
  File "/home/abarilla/.virtualenvs/iocdb/local/lib/python2.7/site-packages/elasticsearch_dsl/document.py", line 115, in __setattr__
    return super(DocType, self).__setattr__(name, value)
  File "/home/abarilla/.virtualenvs/iocdb/local/lib/python2.7/site-packages/elasticsearch_dsl/utils.py", line 418, in __setattr__
    value = self._doc_type.mapping[name].to_python(value)
  File "/home/abarilla/.virtualenvs/iocdb/local/lib/python2.7/site-packages/elasticsearch_dsl/field.py", line 63, in to_python
    return self._to_python(data)
  File "/home/abarilla/.virtualenvs/iocdb/local/lib/python2.7/site-packages/elasticsearch_dsl/field.py", line 133, in _to_python
    return self._doc_class(self.properties, **data)
TypeError: type object argument after ** must be a mapping, not Child

Is there a way I can accomplish this?

Most helpful comment

I wasn't sure if this warranted a new issue, but along these same lines, it seems like currently you need to define your classes like this:

class OrderItem(InnerObjectWrapper):
    pass

class Order(DocType):
    items = Object(
        doc_class=OrderItem,
        multi=True,
        properties={
            'description': Text(),
            'quantity': Integer(),
            'price': Float()
        }
    )

You don't actually need the OrderItem class if you aren't going to customize it. You could just set doc_type=InnerObjectWrapper, but I included it this way so that the examples are consistent.

Where I'd prefer to be able to define them more like this:

class OrderItem(InnerObjectWrapper):
    description = Text()
    quantity = Integer()
    price = Float()

class Order(DocType):
    items = Object(
        doc_class=OrderItem,
        multi=True
    )

In this way the properties definitions for inner objects is consistent with that of the top level. It also allows you to keep your field definitions with the object it belongs to, reducing coupling and allowing for inner objects that could be reusable without having to duplicate the properties definition in every using class.

For example, say you have a CMS with an index for each type of content like articles, slideshows, and quizzes, etc. But each of those documents may want to have an author sub object. If the properties are set where they naturally belong (on the author class) then you don't need do duplicate that code to all of the different content type doc classes.

class Author(InnerObjectWrapper):
    name = Text()
    image_url = Text()
    bio_url = Text()

Is there maybe a way to do this that I'm missing or is it on the roadmap or do I just need to use properties for inner objects?

All 14 comments

Have you tried using the Nested field object?

from elasticsearch_dsl import DocType, Nested, String

class Parent(DocType):
    name = String()
    child = Nested(properties={name: String()})

See documentation here: https://github.com/elastic/elasticsearch-dsl-py/blob/master/docs/persistence.rst

Yes. I tried it with Nested and DocType as well:

class Child(field.Nested):
    name = field.String()

I need to keep the nested objects as plain old objects for backwards compatibility. I can write code that maps between the objects and the nested properties but would prefer not to.

Hmm, you have me confused about what you want and do not want... In Python, everything is an "object", and I'm not sure what a "plain old object" is.

Do you mean that you need to have Parent and Child as two separate document types?

And, are your Parent and Child documents exactly the same type, just that they have a parent-child relationship? In that case you should probably look into at this page: https://www.elastic.co/blog/managing-relations-inside-elasticsearch/

With more information, it might be easier to help...

Oh, and if you need Child to represent an individual document type, you need to make it derive from DocType, not a field class.

class Child(DocType):
    name = String()

There are two different things at play here - the definition of the Nested mappings, which can currently be done, although not through the declarative syntax:

user_field = Nested(
    properties={
      "display_name": String(fields={'raw': String(index='not_analyzed')})
    }
)

class BlogPost(DocType):
    author = user_fielc

If I understand correctly you'd also want to wrap all instances of author into your own python class. To do that you have to create a custom subclass and speify the _doc_class property (you should use a subclass of InnerObjectWrapper:

class Author(Nested):
    _doc_class = MyAuthorClass


user_field = Author(
    properties={
      "display_name": String(fields={'raw': String(index='not_analyzed')})
    }
)

class BlogPost(DocType):
    author = user_fielc

I am certainly open to making this easier, do you have any ideas how you'd like the API to look?

Thanks!

The use case which I'm running across is that there are numerous scripts being handled by celery which insert into ES using some basic objects; such as Parent and Child however Child is not a standalone doctype.

{
  "_index": "myindex",
  "_type": "parent",
  "_id": "v3Ytc2EaSUGz92QDYfRYNA",
  "_score": 1,
  "_source": {
    "name": "test parent",
    "child": {
      "name": "2nd test child"
    }
  }
}

Ideally, a user could create an object like this:

child = Child(name='test other child')
parent = Parent(name='test parent', child=child)
parent.save()

I haven't been using ES long enough to suggest what the appropriate syntax would be for implementing it. However, it seems like an issue that would come up if more integration of elasticsearch-dsl into Django would come up. Objects that are doc types vs objects that are just heirarchical data for a doc type. That's why I was thinking of something along these lines.

class Child(field.Object):
    name = field.String()


class Parent(document.DocType):
    name = field.String()
    child = Child()

    class Meta:
        index = 'myindex'

Like you just posted above I'm sure these more to it then this of course.

@HonzaKral Can the _doc_class be directly InnerObjectWrapper?

@andybarilla It seems to me that Honza's example is what you're looking for...
Transposing it to your example:

class Child(Nested):
    _doc_class = InnerObjectWrapper  # That is, if this is possible...


child = Child(
    properties={
      "name": String()
    }
)

class Parent(DocType):
    name = String()
    child = child

What do you think?

@njoannin it is InnerObjectWrapper by default so if you want to use that, no need to subclass.

OK, cool: that makes it easy.

I have changed this in 7637cf2 to allow passing of the doc_class wrapper as __init__ kwargs without the need for subclassing. Example can be found in the tests - https://github.com/elastic/elasticsearch-dsl-py/blob/master/test_elasticsearch_dsl/test_validation.py#L15

I wasn't sure if this warranted a new issue, but along these same lines, it seems like currently you need to define your classes like this:

class OrderItem(InnerObjectWrapper):
    pass

class Order(DocType):
    items = Object(
        doc_class=OrderItem,
        multi=True,
        properties={
            'description': Text(),
            'quantity': Integer(),
            'price': Float()
        }
    )

You don't actually need the OrderItem class if you aren't going to customize it. You could just set doc_type=InnerObjectWrapper, but I included it this way so that the examples are consistent.

Where I'd prefer to be able to define them more like this:

class OrderItem(InnerObjectWrapper):
    description = Text()
    quantity = Integer()
    price = Float()

class Order(DocType):
    items = Object(
        doc_class=OrderItem,
        multi=True
    )

In this way the properties definitions for inner objects is consistent with that of the top level. It also allows you to keep your field definitions with the object it belongs to, reducing coupling and allowing for inner objects that could be reusable without having to duplicate the properties definition in every using class.

For example, say you have a CMS with an index for each type of content like articles, slideshows, and quizzes, etc. But each of those documents may want to have an author sub object. If the properties are set where they naturally belong (on the author class) then you don't need do duplicate that code to all of the different content type doc classes.

class Author(InnerObjectWrapper):
    name = Text()
    image_url = Text()
    bio_url = Text()

Is there maybe a way to do this that I'm missing or is it on the roadmap or do I just need to use properties for inner objects?

I agree with @JoshCoady . The current way to define Nested objects is quite inconsistent and counter-intuitive. It also prevents reusability of nested objects. I think there should be virtually no difference between nested objects and inner objects from the ORM perspective, apart from the way you declare them:

class OrderItem(InnerObject):
    description = Text()
    quantity = Integer()
    price = Float()

class Order(DocType):
    items = Nested(doc_class=OrderItem)
    # or
    items = OrderItem(multi=True)

Let's say I want to denormalize the last item, I'd like to be able to do something like this:

class Order(DocType):
    items = Nested(doc_class=OrderItem)
    last_item = OrderItem()

Currently, I have to duplicate the OrderItem object to be able to do this, and this is very ugly. Or I missed something?

I'm with @JoshCoady, I naturally want to use the objects like this but it does not seem to work. Is there anyway to inherit the mapping of another object currently?

To followup, it's a bit kldugy, but I've gotten partially there by doing something like this:

class OrderItem(Object):
    def __init__(self, *args, **kwargs):
        kwargs.setdefault('doc_class', InnerObjectWrapper)
        kwargs.setdefault('properties', {}).update(
            description=Text(),
            quantity=Integer(),
            price=Float()
        )
        super().__init__(*args, **kwargs)

class Order(DocType):
    items = OrderItem(multi=True)

Defining OrderItem is a bit messy, but it is now re-useable. You can use the same method to inherit another object:

class DatedOrderItem(OrderItem):
    def __init__(self, *args, **kwargs):
        kwargs.setdefault('properties', {}).update(
            ship_date=Date()
        )
        super().__init__(*args, **kwargs)
Was this page helpful?
0 / 5 - 0 ratings

Related issues

zahir-koradia picture zahir-koradia  路  3Comments

njoannin picture njoannin  路  3Comments

vmogilev picture vmogilev  路  4Comments

primoz-k picture primoz-k  路  4Comments

ypkkhatri picture ypkkhatri  路  4Comments