We are using django-elasticsearch-dsl and unfortunately it is not actively maintained (last commit was on 8th Nov 2018).
What should we do in this case? I can see three options.
django-elasticsearch-dsl.Edit: django-elasticsearch-dsl is updated. :tada:
Just a note.
My elasticsearch version is:
$ curl localhost:9200
{
"name" : "j9iyXmN",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "b4kGzEhFSoiZufVXVlERfg",
"version" : {
"number" : "6.7.1",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "2f32220",
"build_date" : "2019-04-02T15:59:27.961366Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
And all the tests pass.
To execute the elastic search tests, you need to pass an extra option to tox
tox -r -e py36 --including-search
@stsewd
Yes... I am aware of that.
All tests are passing including the search tests. :smile:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@dojutsu-user is there anything actionable on this issue now? I'm not sure, but I think it's not possible to upgrade now. In that case, we should add why it's not possible to upgrade and what are the problems here to track them, and propose a plan --or close it, instead of having it open without adding value.
@humitos
I don't think upgradation should pose any problems.
During the whole gsoc period, I have been using Elasticsearch 6.7
$ curl localhost:9200
{
"name" : "j9iyXmN",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "b4kGzEhFSoiZufVXVlERfg",
"version" : {
"number" : "6.7.2",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "56c6e48",
"build_date" : "2019-04-29T09:05:50.290371Z",
"build_snapshot" : false,
"lucene_version" : "7.7.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
I just read the django-elasticsearch-dsl is going to have a new release pretty soon -- https://github.com/sabricot/django-elasticsearch-dsl/issues/177#issuecomment-509733539 (But not for ES version 7)
I think we have to wait for few days until django-elasticsearch-dsl starts supporting ES v7
This is blocked on https://github.com/sabricot/django-elasticsearch-dsl/issues/170
I am unblocking this as https://github.com/sabricot/django-elasticsearch-dsl/issues/170 is closed and django-elasticsearch-dsl is supporting elasticsearch version 7 (https://pypi.org/project/django-elasticsearch-dsl/)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
We received an email from ES that we need to migrate to a recent version, since 6.x is EOL, this isn't low priority anymore.
Just checked, we are running v6.5.4 in production, we need to update to 6.8.12 before updating to a mayor version.
Changelog for 6.6, 6.7 and 6.8
We are good to upgrade from 6.5 to 6.8. And we don't need a re-index or downtime.
Migration between minor versions — e.g. 6.x to 6.y — can be performed by upgrading one node at a time.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/breaking-changes.html
A rolling upgrade allows an Elasticsearch cluster to be upgraded one node at a time so upgrading does not interrupt service.
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/rolling-upgrades.html
This can be done a day or two before the deploy
This won't cause downtime, but it will give outdated results from a time period
(while we deploy the new instances and re-index).
We could communicate this to users beforehand if we want.
@stsewd on the re-index during "deploy", we should only need to reindex the past 1 day of data, right? So that should be pretty quick. I think this plan sounds good to me. The full reindex might take somewhere around 8-10 hours tho, so we should plan ahead for that.
on the re-index during "deploy", we should only need to reindex the past 1 day of data, right?
Yes, I'll see if I can change the management command to accept that argument or just write a script
Pretty sure it already supports this, or we have some kind of code that can handle it already.
Yea, I have this in my notes:
from datetime import datetime, timedelta
from readthedocs.search.documents import PageDocument
from readthedocs.search.utils import index_new_files
kwargs = {'hours': 48}
since = datetime.now() - timedelta(**kwargs)
ps = Project.objects.filter(versions__builds__date__gte=since).distinct()
print("Indexing %s" % len(ps))
for project_obj in ps:
for version_obj in project_obj.versions.filter(active=True, built=True):
index_new_files(HTMLFile, version_obj, build=version_obj.builds.latest().pk)
Something similar should work.
Great, I have updated my comment with that.
Great -- the only other thing we should consider is what QA will look like on the new vs old cluster. We've had issues in the past with reindexing, so it would be good to have 5-10 queries that we want to test to make sure the results look similar. In particular, the number of results for broad searches, and also the range of versions.
Some of this is that we don't do a great job of cleaning up our indexes. So the current index certainly have some invalid/old/deleted data, but we also need to make sure we aren't missing important things.
Most helpful comment
@stsewd
Yes... I am aware of that.
All tests are passing including the search tests. :smile: