Elasticsearch-dsl-py: Document best-practices for doctype migration

Created on 10 Nov 2015  路  5Comments  路  Source: elastic/elasticsearch-dsl-py

Looks like the only documented way of migration is http://elasticsearch-dsl.readthedocs.org/en/latest/persistence.html#index which recommends to recreate an index from scratch, which is not acceptable in many use-cases, where zero-downtime approach should be followed.
Can elasticsearch-dsl automate the approaches described here https://www.elastic.co/blog/changing-mapping-with-zero-downtime?

documentation

Most helpful comment

I would be happy if we documented this option, what are the commands needed (create new index, reindex, flip alias) and link to the blog post. Do you think that would make sense?

It would be interesting if some elasticsearch-dsl-py pseudo-code could be jotted down either in this issue or in a blog post. Will save time for people to not have to figure it out on their own.

All 5 comments

...which is not acceptable in many use-cases, where zero-downtime approach should be followed.

when you are changing mappings, there is no other way, even the blog post you link has index recreated from scratch, just the data populated by reindex.

Can elasticsearch-dsl automate the approaches described here https://www.elastic.co/blog/changing-mapping-with-zero-downtime?

We could make that as a helper, but overall I don't think this is a best idea to have something like this automated - it requires a LOT of resources (essentilly your cluster has to have 50% free capacity just to attempt this because at some point you will have your data there twice) and it is very simple to do - just create another Index object with the new name, call create on it and then use the reindex helper from elasticsearch-py. Then flip the alias. Essentially 4 lines of code that I don't want to make easier because I feel that users should really know the possible consequences of running something like this.

Also running something like this from python, while very easily done, might not be the best option due to performance - ideally you want to perform the reindex in parallel over multiple machines, that is something that is way past the scope of this library.

I would be happy if we documented this option, what are the commands needed (create new index, reindex, flip alias) and link to the blog post. Do you think that would make sense?

Thanks for quick reply!
I agree with all points - elasticsearch-dsl-py should not probably provide direct means for changing mappings, but instead provide a how-to for inspiration how your DSL models evolve in a real project over time.

I would be happy if we documented this option, what are the commands needed (create new index, reindex, flip alias) and link to the blog post. Do you think that would make sense?

It would be interesting if some elasticsearch-dsl-py pseudo-code could be jotted down either in this issue or in a blog post. Will save time for people to not have to figure it out on their own.

Closing in favor of a checkbox in #801

Was this page helpful?
0 / 5 - 0 ratings

Related issues

primoz-k picture primoz-k  路  4Comments

vanzi picture vanzi  路  4Comments

gabrielpjordao picture gabrielpjordao  路  3Comments

vmogilev picture vmogilev  路  4Comments

barseghyanartur picture barseghyanartur  路  4Comments