Elasticsearch-dsl-py: Bulk Indexing of DocTypes? Documentation, or Feature.

Created on 13 May 2015 · 14Comments · Source: elastic/elasticsearch-dsl-py

I've asked in the #elasticsearch channel on Free node, and was told that helpers.bulk from the elasticsearch module can be used to index documents in bulk.

Can this be used in conjunction w/DocTypes? (Basically instead of .save(), maybe making something like .queue_bulk(), then .save_bulk()).

Or is this already possible by passing the DocType to helpers.bulk? Searched documents but couldn't find anything related to bulk indexing.

Thanks!

Source

brizzbane

👍1

Most helpful comment

Implemented DocType.to_dict(include_metadata=True) which will include all the metadata from the document in the format that bulk expects.

HonzaKral on 15 May 2015

👍3

All 14 comments

Ah, good point. Currently is isn't possible to tie these two together directly., what you'd have to do is:

bulk(es, ({'_index': getattr(d.meta, 'index', d._doc_type.index), '_type': d._doc_type.name, '_source': d.to_dict()} for d in MY_DOCS))

I will think about it and probably add a parameter to to_dict on the DocType to produce the full dict, including the metadata that could then be passed to bulk.

Does that make sense?

HonzaKral on 14 May 2015

👍2

Yea it does, thanks!

brizzbane on 15 May 2015

Implemented DocType.to_dict(include_metadata=True) which will include all the metadata from the document in the format that bulk expects.

HonzaKral on 15 May 2015

👍3

This is really nice! (almost as nice as being able to pass an Index instance an iterable of DocTypes for bulk indexing ;P Though one would have to find a way to specify the _op_type)

Thanks for your hard work and looking forward to the next release!

0x64746b on 22 May 2015

@0x64746b agreed, for now I want to make eveything work with strings, with elasticsearch-py being kept unaware of the dsl library. Later we can figure out what kind of convenient code paths could be added for more flexibility, Index.bulk method or DocType.bulk classmethod come to mind in this example.

HonzaKral on 22 May 2015

Implemented DocType.to_dict(include_metadata=True)

Ftr: it's DocType.to_dict(include_meta=True)

0x64746b on 2 Jun 2015

👍1

Hey sorry for bringing this issue up again.

My question is, with the new DocType.to_dict(include_meta=True), I am able to figure out syntax to index, but how can I update (or specifically, I would like to upsert).

Not finding much online. Wish the readthedocs had examples :.

brizzbane on 16 Jun 2015

If the document has an id (doc.meta.id) it will replace the current document in elasticsearch so it will automatically perform an update. If it's not in elasticsearch, it will be inserted - the index operation for bulk (which is the default operation) behaves like that.

If you want anything else like partial updates or upserts (above the behavior described) you need to specify it manually since the document object cannot really help you there. I'd recommend creating a method on the DocType subclass you are using to produce the correct operation.

HonzaKral on 16 Jun 2015

Awesome, thanks., and noticed I missed this: If you wish to perform other operations, like delete or update use the _op_type field in your actions (_op_type defaults to index)

thanks for the super quick response!

brizzbane on 16 Jun 2015

Is it possible to have a script (to increment a counter) directly in the DocType class? This would be really awesome. So majority of logic that needs to happen could happen directly in python, then at index time, If DocType's some_value = False: one script gets run, otherwise some_other_value, a different script is ran.

brizzbane on 16 Jun 2015

This would be super hard to do safely in a generic fashion - what you can do however is to have simple methods on the DocType class that either return data in format for the bulk helper or call the update API directly to do what you wish.

Does that make sense?

HonzaKral on 16 Jun 2015

I am sorry to ask this question if its too naive, but is there a way to create documents from django models?

gladsonvm on 6 Mar 2017

@gladsonvm sure, just produce a dictionary representing your model and insert it into elasticsearch. You can also use the persistence layer to make it cleaner.

I have an example project using django where you can see one of the ways it can be done - https://github.com/HonzaKral/es-django-example

HonzaKral on 9 Mar 2017

Thanks for the reply :). I did it using postgresql query itself. PSQL query was executed and then output was written to a file. That file was edited with python and information regarding index and doc_type was inserted before each line using python's fileinput. Then I used curl -s -XPOST localhost:9200/_bulk --data-binary "@docs.json"; echo to update all index/doc_type info to elasticsearch. But from the example project it seems this can be achieved easier.

gladsonvm on 9 Mar 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Create custom analyzer filter for a index in py-elasticsearch-dsl

SalahAdDin · 4Comments

How to achieve more like this functionality?

barseghyanartur · 4Comments

Run time error during search execution: "NotFoundError: TransportError(404, u'search_phase_execution_exception', u'No search context found for id [8664053]')"

arizhakov · 4Comments

Print final Query sent to Elasticsearch

vmogilev · 4Comments

Migration from elasticsearch-py fails on execute

primoz-k · 4Comments