Readthedocs.org: Upgrade elastic search to 7.x

Created on 22 Apr 2019 · 20Comments · Source: readthedocs/readthedocs.org

https://www.elastic.co/blog/elasticsearch-7-0-0-released

Changelog https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html

Accepted Improvement

Source

stsewd

Most helpful comment

@stsewd
Yes... I am aware of that.
All tests are passing including the search tests. :smile:

dojutsu-user on 26 Apr 2019

👍2

All 20 comments

We are using django-elasticsearch-dsl and unfortunately it is not actively maintained (last commit was on 8th Nov 2018).

What should we do in this case? I can see three options.

Wait for the update of django-elasticsearch-dsl.
Find another library.
Switch to only using official low-level library (elasticsearch-py), which is updated, but this involves lot of work.

Edit: django-elasticsearch-dsl is updated. :tada:

dojutsu-user on 23 Apr 2019

Just a note.
My elasticsearch version is:

$ curl localhost:9200
{
  "name" : "j9iyXmN",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "b4kGzEhFSoiZufVXVlERfg",
  "version" : {
    "number" : "6.7.1",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "2f32220",
    "build_date" : "2019-04-02T15:59:27.961366Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

And all the tests pass.

dojutsu-user on 25 Apr 2019

To execute the elastic search tests, you need to pass an extra option to tox
tox -r -e py36 --including-search

stsewd on 26 Apr 2019

@stsewd
Yes... I am aware of that.
All tests are passing including the search tests. :smile:

dojutsu-user on 26 Apr 2019

👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 17 Jul 2019

@dojutsu-user is there anything actionable on this issue now? I'm not sure, but I think it's not possible to upgrade now. In that case, we should add why it's not possible to upgrade and what are the problems here to track them, and propose a plan --or close it, instead of having it open without adding value.

humitos on 18 Jul 2019

@humitos
I don't think upgradation should pose any problems.
During the whole gsoc period, I have been using Elasticsearch 6.7

$ curl localhost:9200
{
  "name" : "j9iyXmN",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "b4kGzEhFSoiZufVXVlERfg",
  "version" : {
    "number" : "6.7.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "56c6e48",
    "build_date" : "2019-04-29T09:05:50.290371Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

I just read the django-elasticsearch-dsl is going to have a new release pretty soon -- https://github.com/sabricot/django-elasticsearch-dsl/issues/177#issuecomment-509733539 (But not for ES version 7)
I think we have to wait for few days until django-elasticsearch-dsl starts supporting ES v7

dojutsu-user on 18 Jul 2019

This is blocked on https://github.com/sabricot/django-elasticsearch-dsl/issues/170

humitos on 22 Jul 2019

I am unblocking this as https://github.com/sabricot/django-elasticsearch-dsl/issues/170 is closed and django-elasticsearch-dsl is supporting elasticsearch version 7 (https://pypi.org/project/django-elasticsearch-dsl/)

dojutsu-user on 31 Aug 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 15 Oct 2019

We received an email from ES that we need to migrate to a recent version, since 6.x is EOL, this isn't low priority anymore.

stsewd on 25 Jun 2020

Just checked, we are running v6.5.4 in production, we need to update to 6.8.12 before updating to a mayor version.

stsewd on 24 Aug 2020

Changelog for 6.6, 6.7 and 6.8

We are good to upgrade from 6.5 to 6.8. And we don't need a re-index or downtime.

Migration between minor versions — e.g. 6.x to 6.y — can be performed by upgrading one node at a time.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/breaking-changes.html

A rolling upgrade allows an Elasticsearch cluster to be upgraded one node at a time so upgrading does not interrupt service.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/rolling-upgrades.html

stsewd on 26 Aug 2020

How to deploy avoiding downtime

Before the deploy

This can be done a day or two before the deploy

Create new deploy in ES cloud with ES 7.x
Change the ops repo to point to the new ES host
Deploy web-extra or a new instance with the code from ES 7.x
Trigger a re-index to the new deploy

During/after the deploy

Deploy the new instances using 7.x. Here we will have two instances running, one for 6.x and the other one with 7.x (but each one will be pointing to a different deploy in ES cloud)
When only the 7.x instances are running, trigger a re-index.
Here we only need to re-index the projects with new builds from the last 24/48 hours,
we can use the script from https://github.com/readthedocs/readthedocs.org/issues/5620#issuecomment-716760493
Make sure everything is working
Delete the old deploy

This won't cause downtime, but it will give outdated results from a time period
(while we deploy the new instances and re-index).
We could communicate this to users beforehand if we want.

stsewd on 26 Oct 2020

👍1

@stsewd on the re-index during "deploy", we should only need to reindex the past 1 day of data, right? So that should be pretty quick. I think this plan sounds good to me. The full reindex might take somewhere around 8-10 hours tho, so we should plan ahead for that.

ericholscher on 26 Oct 2020

on the re-index during "deploy", we should only need to reindex the past 1 day of data, right?

Yes, I'll see if I can change the management command to accept that argument or just write a script

stsewd on 26 Oct 2020

Pretty sure it already supports this, or we have some kind of code that can handle it already.

ericholscher on 26 Oct 2020

Yea, I have this in my notes:

from datetime import datetime, timedelta
from readthedocs.search.documents import PageDocument
from readthedocs.search.utils import index_new_files

kwargs = {'hours': 48}
since = datetime.now() - timedelta(**kwargs)

ps = Project.objects.filter(versions__builds__date__gte=since).distinct()
print("Indexing %s" % len(ps))
for project_obj in ps:
  for version_obj in project_obj.versions.filter(active=True, built=True):
    index_new_files(HTMLFile, version_obj, build=version_obj.builds.latest().pk)

Something similar should work.

ericholscher on 26 Oct 2020

👍1

Great, I have updated my comment with that.

stsewd on 26 Oct 2020

Great -- the only other thing we should consider is what QA will look like on the new vs old cluster. We've had issues in the past with reindexing, so it would be good to have 5-10 queries that we want to test to make sure the results look similar. In particular, the number of results for broad searches, and also the range of versions.

Some of this is that we don't do a great job of cleaning up our indexes. So the current index certainly have some invalid/old/deleted data, but we also need to make sure we aren't missing important things.

ericholscher on 26 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings