Elasticsearch-dsl-py: Upsert in `update`

Created on 3 Sep 2015  路  5Comments  路  Source: elastic/elasticsearch-dsl-py

Would it make sense to tweak DocType.update() to perform an upsert if the item doesn't already exist (at https://github.com/elastic/elasticsearch-dsl-py/blob/master/elasticsearch_dsl/document.py#L227)?

    meta = es.update(
        index=self._get_index(index),
        doc_type=self._doc_type.name,
        body={'doc': fields, 'doc_as_upsert': True, 'detect_noop': True},
        **doc_meta
    )

Most helpful comment

Example where upsert is convenient.

Lets assume I have simple document.

def Document(DocType):
text = Text()
read_counter = Integer()

Now consider that there is two separate programs that use this Document.
One is indexing Documents without touching counter and other updates only counter.
The later can assume that document always exist so I can use document.update(read_counter=reads).
I can't do the same for first because what I need is to index document if it does not exist or update if it exists unfortunately document.save() will overwrite whole document thus read_counter is lost in process.

I know I could get document first to check if it exists but the whole point of the update API in ES is to avoid that.

Note, this operation still means full reindex of the document, it just removes some network roundtrips and reduces chances of version conflicts between the get and the index.
Although it says full reindex it reads current document merges it with received one and then saves so it does not loose data that was there.

Otherwise why implementing update at all? I can always get instance, update some fields and save whole document.

All 5 comments

I am wondering why you'd then use update instead of just save when upsert behavior is what you want?

Examining the code for save didn't show me any upsert options, but perhaps I was looking at an outdated version? Having worked through the low level operation manually first, the operation currently used in save didn't appear to have the desired behavior, and I had to use the aforementioned doc_as_upsert in order to do an upsert. Perhaps I am mistaken and save will actually overwrite when necessary.

No, there is no explicit upsert, but it behaves in a similar way - if the document exists it will get overridden, otherwise it will get created.

Example where upsert is convenient.

Lets assume I have simple document.

def Document(DocType):
text = Text()
read_counter = Integer()

Now consider that there is two separate programs that use this Document.
One is indexing Documents without touching counter and other updates only counter.
The later can assume that document always exist so I can use document.update(read_counter=reads).
I can't do the same for first because what I need is to index document if it does not exist or update if it exists unfortunately document.save() will overwrite whole document thus read_counter is lost in process.

I know I could get document first to check if it exists but the whole point of the update API in ES is to avoid that.

Note, this operation still means full reindex of the document, it just removes some network roundtrips and reduces chances of version conflicts between the get and the index.
Although it says full reindex it reads current document merges it with received one and then saves so it does not loose data that was there.

Otherwise why implementing update at all? I can always get instance, update some fields and save whole document.

so, how on earth update a doc like upsert?
did you find a solution at last?
is there a document tell us how to implement?

thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

takaomag picture takaomag  路  3Comments

leoliuxd picture leoliuxd  路  4Comments

zahir-koradia picture zahir-koradia  路  3Comments

SalahAdDin picture SalahAdDin  路  4Comments

njoannin picture njoannin  路  3Comments