elasticsearch version 1.4.2
elasticsearch-dsl-py version 0.0.11
elasticsearch-dsl-py is to support bulk save?
how can I use bulk save with elasticsearch-dsl-py
You can use the bulk helpers from elasticsearch-py (http://elasticsearch-py.readthedocs.io/en/master/helpers.html#bulk-helpers):
from elasticsearch_dsl.connections import connection
from elasticsearch.helpers import bulk
bulk(connections.get_connection(), (d.to_dict(True) for d in DOCUMENTS))
thanks, I get it.
@shejianmin do you think it would make sense to introduce a bulk helper to the dsl library? A simple function that will call the to_dict on all the documents and call the underlying helper? I always thought it's too thin an abstraction to be useful, but I am obviously biased :)
Thanks!
I think it would be at least useful to document the bulk use case. It's not clear if one needs to add additional metadata to the output of to_dict that tells bulk that the action being taken is a create, update, delete, or get. (Still not sure how one would do, say, a bulk delete)
For example, if I were to make a base class that looks like this:
class DocumentBase(DocType):
@classmethod
def bulk_save(cls, dicts):
objects = (cls.create(d).to_dict(include_meta=True) for d in dicts)
client = connections.get_connection()
return bulk(client, objects)
How would I change this to do a bulk update/delete, or do bulk get's in general?
@HonzaKral, i'm agreed with @j2kun
@j2kun update/delete can use _op_type, it's like
from elasticsearch_dsl.connections import connection
from elasticsearch.helpers import bulk
bulk(connection.get_connection(), (dict(d.to_dict(True), **{'_op_type': 'update'}) for d in DOCUMENTS))
upsert is like this
from elasticsearch_dsl.connections import connection
from elasticsearch.helpers import bulk
def upsert(doc):
d = doc.to_dict(True)
d['_op_type'] = 'update'
d['doc'] = d['_source']
d['doc_as_upsert'] = True
del d['_source']
return d
bulk(connection.get_connection(), (upsert(d) for d in DOCUMENTS))
@luoxiaohei why did you use upsert?
@SalahAdDin I have multiple services that update different parts of the same record, and I don't known if this record exists when I update the record.
I am a beginner at Elasticsearch. Is not upsert recommended?
@j2kun update/delete can use
_op_type, it's likefrom elasticsearch_dsl.connections import connection from elasticsearch.helpers import bulk bulk(connection.get_connection(), (dict(d.to_dict(True), **{'_op_type': 'update'}) for d in DOCUMENTS))upsert is like this
from elasticsearch_dsl.connections import connection from elasticsearch.helpers import bulk def upsert(doc): d = doc.to_dict(True) d['_op_type'] = 'update' d['doc'] = d['_source'] d['doc_as_upsert'] = True del d['_source'] return d bulk(connection.get_connection(), (upsert(d) for d in DOCUMENTS))
is it possible to get number of newly created documents this way?
@HonzaKral would it be possible to take the save elasticsearch dsl command and apply it to the lowerlevel elasticsearch bulk command?
For future searchers, I think this line:
from elasticsearch_dsl.connections import connection
Now needs to be:
from elasticsearch_dsl.connections import connections # plural
Most helpful comment
For example, if I were to make a base class that looks like this:
How would I change this to do a bulk update/delete, or do bulk get's in general?