Elasticsearch-dsl-py: how to use bulk save

Created on 11 May 2016  路  12Comments  路  Source: elastic/elasticsearch-dsl-py

elasticsearch version 1.4.2
elasticsearch-dsl-py version 0.0.11

elasticsearch-dsl-py is to support bulk save?
how can I use bulk save with elasticsearch-dsl-py

Most helpful comment

For example, if I were to make a base class that looks like this:

class DocumentBase(DocType):
    @classmethod
    def bulk_save(cls, dicts):
        objects = (cls.create(d).to_dict(include_meta=True) for d in dicts)
        client = connections.get_connection()
        return bulk(client, objects)

How would I change this to do a bulk update/delete, or do bulk get's in general?

All 12 comments

You can use the bulk helpers from elasticsearch-py (http://elasticsearch-py.readthedocs.io/en/master/helpers.html#bulk-helpers):

from elasticsearch_dsl.connections import connection
from elasticsearch.helpers import bulk

bulk(connections.get_connection(), (d.to_dict(True) for d in DOCUMENTS))

thanks, I get it.

@shejianmin do you think it would make sense to introduce a bulk helper to the dsl library? A simple function that will call the to_dict on all the documents and call the underlying helper? I always thought it's too thin an abstraction to be useful, but I am obviously biased :)

Thanks!

I think it would be at least useful to document the bulk use case. It's not clear if one needs to add additional metadata to the output of to_dict that tells bulk that the action being taken is a create, update, delete, or get. (Still not sure how one would do, say, a bulk delete)

For example, if I were to make a base class that looks like this:

class DocumentBase(DocType):
    @classmethod
    def bulk_save(cls, dicts):
        objects = (cls.create(d).to_dict(include_meta=True) for d in dicts)
        client = connections.get_connection()
        return bulk(client, objects)

How would I change this to do a bulk update/delete, or do bulk get's in general?

@HonzaKral, i'm agreed with @j2kun

@j2kun update/delete can use _op_type, it's like

from elasticsearch_dsl.connections import connection
from elasticsearch.helpers import bulk

bulk(connection.get_connection(), (dict(d.to_dict(True), **{'_op_type': 'update'}) for d in DOCUMENTS))

upsert is like this

from elasticsearch_dsl.connections import connection
from elasticsearch.helpers import bulk

def upsert(doc):
    d = doc.to_dict(True)
    d['_op_type'] = 'update'
    d['doc'] = d['_source']
    d['doc_as_upsert'] = True
    del d['_source']

    return d

bulk(connection.get_connection(), (upsert(d) for d in DOCUMENTS))

@luoxiaohei why did you use upsert?

@SalahAdDin I have multiple services that update different parts of the same record, and I don't known if this record exists when I update the record.
I am a beginner at Elasticsearch. Is not upsert recommended?

@j2kun update/delete can use _op_type, it's like

from elasticsearch_dsl.connections import connection
from elasticsearch.helpers import bulk

bulk(connection.get_connection(), (dict(d.to_dict(True), **{'_op_type': 'update'}) for d in DOCUMENTS))

upsert is like this

from elasticsearch_dsl.connections import connection
from elasticsearch.helpers import bulk

def upsert(doc):
    d = doc.to_dict(True)
    d['_op_type'] = 'update'
    d['doc'] = d['_source']
    d['doc_as_upsert'] = True
    del d['_source']

    return d

bulk(connection.get_connection(), (upsert(d) for d in DOCUMENTS))

is it possible to get number of newly created documents this way?

@HonzaKral would it be possible to take the save elasticsearch dsl command and apply it to the lowerlevel elasticsearch bulk command?

For future searchers, I think this line:

from elasticsearch_dsl.connections import connection

Now needs to be:

from elasticsearch_dsl.connections import connections  # plural
Was this page helpful?
0 / 5 - 0 ratings

Related issues

beanaroo picture beanaroo  路  4Comments

leoliuxd picture leoliuxd  路  4Comments

rokcarl picture rokcarl  路  4Comments

berinhard picture berinhard  路  3Comments

primoz-k picture primoz-k  路  4Comments