Elasticsearch: Issue with sliced scroll using routing

Created on 27 Nov 2017  路  5Comments  路  Source: elastic/elasticsearch

Hi all, first a quick disclaimer, I'm not entirely sure if the following is a bug or documentation issue. After reading the sliced scroll section from the scroll API docs I got the impression that sliced scroll is supposed to work when targeting a single shard.

Elasticsearch version (bin/elasticsearch --version): 5.5.2, but I've tested with elasticsearch 6 and reproduced the same behaviour.

Plugins installed:

curl localhost:9200/_cat/plugins
o2qKP9T ingest-geoip      5.5.2
o2qKP9T ingest-user-agent 5.5.2
o2qKP9T x-pack            5.5.2

JVM version (java -version):

$ java -version
openjdk version "1.8.0_141"
OpenJDK Runtime Environment (build 1.8.0_141-b16)
OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)

OS version (uname -a if on a Unix-like system): I'm using elasticsearch official docker image.
docker.elastic.co/elasticsearch/elasticsearch:5.5.2

Description of the problem including expected versus actual behavior:

I'm trying to perform a sliced scroll targeting only one shard through routing and elasticsearch is returning all the results in only one of the 2 slices.

I expect elasticsearch to slice the query/results across all slices, even when targeting one shard only.

Steps to reproduce:

I have created a small bash script to reproduce the problem, please find it here.

Here are my results when I run the script using 1 and 2 shards.

Using 1 shard

$ bash sliced_scroll.sh 1
ES version
{
  "name" : "o2qKP9T",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "bhLcjiBTTBaWlq6OuVC-Mg",
  "version" : {
    "number" : "5.5.2",
    "build_hash" : "b2f0c09",
    "build_date" : "2017-08-14T12:33:14.154Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}
Create index
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 48

{"acknowledged":true,"shards_acknowledged":true}Adding docs...
slice id 0 search
4
slice id 1 search
5

Elasticsearch returns 2 slices, splitting the query/results as expected.

Using 2 shards

$ bash sliced_scroll.sh 2
ES version
{
  "name" : "o2qKP9T",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "bhLcjiBTTBaWlq6OuVC-Mg",
  "version" : {
    "number" : "5.5.2",
    "build_hash" : "b2f0c09",
    "build_date" : "2017-08-14T12:33:14.154Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.0"
  },
  "tagline" : "You Know, for Search"
}
Create index
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 48

{"acknowledged":true,"shards_acknowledged":true}Adding docs...
slice id 0 search
9
slice id 1 search
0

Elasticsearch returns 2 slices, but doesn't split the query/results, returning all results in only one slice.

I hope this covers all details required to reproduce the issue and I apologise in case this is the expected behaviour and I'm missing something.

Regards,
Alisson Sales

:SearcSearch >bug

Most helpful comment

I think it鈥檚 expected as of today from how it鈥檚 implemented but we should really try to fix it to also work if there is more than one shard and routing is used, I agree it looks like a bug! Thanks for opening this issue

All 5 comments

I think it鈥檚 expected as of today from how it鈥檚 implemented but we should really try to fix it to also work if there is more than one shard and routing is used, I agree it looks like a bug! Thanks for opening this issue

Yes this is expected because only the total number of shards per index is used to perform the slicing.
We could take the routing into account but we have multiple ways to filter/route searches based on the sharding. For the simple routing case where a single shard is selected per index this is simple since we just need to pass this information to the shard request (the slices are resolved in the shard directly) but it is more complicated to handle routing index partition: (https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html#routing-index-partition)
and _shards preferences (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html) since we don't pass this information to shard requests and we would need more than a boolean that indicates if a single shard is requested or not.
I need to think more about this but I agree with @s1monw that we should try to fix it, I'd just add that if we fix it it should work for all types of routing.

Pinging @elastic/es-search-aggs

I've recently come across the same problem (at least I believe it is https://discuss.elastic.co/t/empty-slices-with-scan-scroll/127255). Is this issue being actively looked at, or is it seen as low priority?

It's on my todo list, not high priority but I'll try to find some time in the coming days to work on a fix.

Was this page helpful?
0 / 5 - 0 ratings