Elasticsearch: Scan/Scroll performance degrading logarithmically

Created on 11 May 2016  路  7Comments  路  Source: elastic/elasticsearch

Elasticsearch version: 1.7.5
JVM version: java version "1.7.0_99"
OpenJDK Runtime Environment (rhel-2.6.5.0.el6_7-x86_64 u99-b00)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
OS version: CentOS release 6.7 (Final) 2.6.32-573.22.1.el6.x86_64

Hi,
We are in the process of upgrading from 1.7 to 2.X, but first we must re-index into mappings compatible with 2.X. We are using the reindex() helper method of elasticsearch-py v1.9.0 to reindex a relatively small 7GB index, and noticed that performance starts out fast then logarithmically degrades over time:

99445104-0bca-11e6-85f5-b0e3df98eed8
(Note this shows indexing requests, but this is a function of the slowdown in scan/scroll as I'll show below.)

As a test, I set up a simple call to scan() on the 7GB index and printed some benchmarks. I aborted the scan after 13 hours, and it hadn't even gotten close to scanning all 27M docs over those 13 hours:

  Starting scan...
    Scanned 100,000 docs (batch time: 0:00:07 / total time: 0:00:07 )
    Scanned 200,000 docs (batch time: 0:00:11 / total time: 0:00:18 )
    Scanned 300,000 docs (batch time: 0:00:14 / total time: 0:00:33 )
    Scanned 400,000 docs (batch time: 0:00:18 / total time: 0:00:52 )
    Scanned 500,000 docs (batch time: 0:00:23 / total time: 0:01:15 )
    Scanned 600,000 docs (batch time: 0:00:26 / total time: 0:01:42 )
    Scanned 700,000 docs (batch time: 0:00:30 / total time: 0:02:13 )
    Scanned 800,000 docs (batch time: 0:00:34 / total time: 0:02:47 )
    Scanned 900,000 docs (batch time: 0:00:38 / total time: 0:03:25 )
    Scanned 1,000,000 docs (batch time: 0:00:41 / total time: 0:04:07 )
    Scanned 1,100,000 docs (batch time: 0:00:45 / total time: 0:04:53 )
    Scanned 1,200,000 docs (batch time: 0:00:49 / total time: 0:05:43 )
    Scanned 1,300,000 docs (batch time: 0:00:53 / total time: 0:06:36 )
    Scanned 1,400,000 docs (batch time: 0:00:58 / total time: 0:07:34 )
    Scanned 1,500,000 docs (batch time: 0:01:03 / total time: 0:08:37 )
    Scanned 1,600,000 docs (batch time: 0:01:06 / total time: 0:09:44 )
    Scanned 1,700,000 docs (batch time: 0:01:09 / total time: 0:10:54 )
    Scanned 1,800,000 docs (batch time: 0:01:13 / total time: 0:12:07 )
    Scanned 1,900,000 docs (batch time: 0:01:17 / total time: 0:13:24 )
    Scanned 2,000,000 docs (batch time: 0:01:20 / total time: 0:14:45 )
    Scanned 2,100,000 docs (batch time: 0:01:24 / total time: 0:16:10 )
    Scanned 2,200,000 docs (batch time: 0:01:34 / total time: 0:17:44 )
    Scanned 2,300,000 docs (batch time: 0:01:46 / total time: 0:19:30 )
    Scanned 2,400,000 docs (batch time: 0:01:37 / total time: 0:21:08 )
    Scanned 2,500,000 docs (batch time: 0:01:42 / total time: 0:22:50 )
    Scanned 2,600,000 docs (batch time: 0:01:50 / total time: 0:24:40 )
    Scanned 2,700,000 docs (batch time: 0:01:54 / total time: 0:26:34 )
    Scanned 2,800,000 docs (batch time: 0:01:54 / total time: 0:28:28 )
    Scanned 2,900,000 docs (batch time: 0:01:55 / total time: 0:30:23 )
    Scanned 3,000,000 docs (batch time: 0:02:05 / total time: 0:32:29 )
    Scanned 3,100,000 docs (batch time: 0:02:12 / total time: 0:34:42 )
    Scanned 3,200,000 docs (batch time: 0:02:13 / total time: 0:36:55 )
    Scanned 3,300,000 docs (batch time: 0:02:18 / total time: 0:39:14 )
    Scanned 3,400,000 docs (batch time: 0:02:14 / total time: 0:41:28 )
    Scanned 3,500,000 docs (batch time: 0:02:24 / total time: 0:43:53 )
    Scanned 3,600,000 docs (batch time: 0:02:40 / total time: 0:46:34 )
    Scanned 3,700,000 docs (batch time: 0:02:50 / total time: 0:49:25 )
    Scanned 3,800,000 docs (batch time: 0:02:37 / total time: 0:52:02 )
    Scanned 3,900,000 docs (batch time: 0:02:35 / total time: 0:54:38 )
    Scanned 4,000,000 docs (batch time: 0:02:37 / total time: 0:57:15 )
    Scanned 4,100,000 docs (batch time: 0:02:42 / total time: 0:59:58 )
    Scanned 4,200,000 docs (batch time: 0:02:45 / total time: 1:02:43 )
    Scanned 4,300,000 docs (batch time: 0:02:57 / total time: 1:05:40 )
    Scanned 4,400,000 docs (batch time: 0:03:02 / total time: 1:08:43 )
    Scanned 4,500,000 docs (batch time: 0:03:03 / total time: 1:11:46 )
    Scanned 4,600,000 docs (batch time: 0:03:12 / total time: 1:14:59 )
    Scanned 4,700,000 docs (batch time: 0:03:17 / total time: 1:18:16 )
    Scanned 4,800,000 docs (batch time: 0:03:15 / total time: 1:21:32 )
    Scanned 4,900,000 docs (batch time: 0:03:25 / total time: 1:24:57 )
    Scanned 5,000,000 docs (batch time: 0:03:23 / total time: 1:28:21 )
    Scanned 5,100,000 docs (batch time: 0:03:34 / total time: 1:31:55 )
    Scanned 5,200,000 docs (batch time: 0:03:37 / total time: 1:35:32 )
    Scanned 5,300,000 docs (batch time: 0:03:28 / total time: 1:39:01 )
    Scanned 5,400,000 docs (batch time: 0:03:37 / total time: 1:42:38 )
    Scanned 5,500,000 docs (batch time: 0:03:48 / total time: 1:46:27 )
    Scanned 5,600,000 docs (batch time: 0:03:54 / total time: 1:50:21 )
    Scanned 5,700,000 docs (batch time: 0:03:43 / total time: 1:54:04 )
    Scanned 5,800,000 docs (batch time: 0:03:46 / total time: 1:57:51 )
    Scanned 5,900,000 docs (batch time: 0:04:03 / total time: 2:01:55 )
    Scanned 6,000,000 docs (batch time: 0:04:09 / total time: 2:06:04 )
    Scanned 6,100,000 docs (batch time: 0:04:12 / total time: 2:10:17 )
    Scanned 6,200,000 docs (batch time: 0:04:07 / total time: 2:14:25 )
    Scanned 6,300,000 docs (batch time: 0:04:31 / total time: 2:18:56 )
    Scanned 6,400,000 docs (batch time: 0:04:20 / total time: 2:23:16 )
    Scanned 6,500,000 docs (batch time: 0:04:28 / total time: 2:27:45 )
    Scanned 6,600,000 docs (batch time: 0:04:22 / total time: 2:32:07 )
    Scanned 6,700,000 docs (batch time: 0:04:23 / total time: 2:36:31 )
    Scanned 6,800,000 docs (batch time: 0:04:38 / total time: 2:41:10 )
    Scanned 6,900,000 docs (batch time: 0:04:45 / total time: 2:45:56 )
    Scanned 7,000,000 docs (batch time: 0:04:56 / total time: 2:50:53 )
    Scanned 7,100,000 docs (batch time: 0:04:47 / total time: 2:55:40 )
    Scanned 7,200,000 docs (batch time: 0:04:56 / total time: 3:00:37 )
    Scanned 7,300,000 docs (batch time: 0:04:57 / total time: 3:05:34 )
    Scanned 7,400,000 docs (batch time: 0:05:04 / total time: 3:10:39 )
    Scanned 7,500,000 docs (batch time: 0:04:55 / total time: 3:15:34 )
    Scanned 7,600,000 docs (batch time: 0:05:14 / total time: 3:20:49 )
    Scanned 7,700,000 docs (batch time: 0:05:17 / total time: 3:26:06 )
    Scanned 7,800,000 docs (batch time: 0:05:12 / total time: 3:31:18 )
    Scanned 7,900,000 docs (batch time: 0:05:17 / total time: 3:36:36 )
    Scanned 8,000,000 docs (batch time: 0:05:32 / total time: 3:42:08 )
    Scanned 8,100,000 docs (batch time: 0:05:27 / total time: 3:47:36 )
    Scanned 8,200,000 docs (batch time: 0:05:44 / total time: 3:53:20 )
    Scanned 8,300,000 docs (batch time: 0:05:22 / total time: 3:58:43 )
    Scanned 8,400,000 docs (batch time: 0:05:32 / total time: 4:04:16 )
    Scanned 8,500,000 docs (batch time: 0:05:42 / total time: 4:09:59 )
    Scanned 8,600,000 docs (batch time: 0:05:48 / total time: 4:15:48 )
    Scanned 8,700,000 docs (batch time: 0:05:57 / total time: 4:21:45 )
    Scanned 8,800,000 docs (batch time: 0:06:00 / total time: 4:27:46 )
    Scanned 8,900,000 docs (batch time: 0:05:50 / total time: 4:33:37 )
    Scanned 9,000,000 docs (batch time: 0:06:00 / total time: 4:39:38 )
    Scanned 9,100,000 docs (batch time: 0:05:59 / total time: 4:45:38 )
    Scanned 9,200,000 docs (batch time: 0:06:16 / total time: 4:51:54 )
    Scanned 9,300,000 docs (batch time: 0:06:21 / total time: 4:58:15 )
    Scanned 9,400,000 docs (batch time: 0:06:22 / total time: 5:04:37 )
    Scanned 9,500,000 docs (batch time: 0:06:19 / total time: 5:10:57 )
    Scanned 9,600,000 docs (batch time: 0:06:11 / total time: 5:17:09 )
    Scanned 9,700,000 docs (batch time: 0:06:27 / total time: 5:23:36 )
    Scanned 9,800,000 docs (batch time: 0:06:19 / total time: 5:29:56 )
    Scanned 9,900,000 docs (batch time: 0:06:35 / total time: 5:36:32 )
    Scanned 10,000,000 docs (batch time: 0:06:42 / total time: 5:43:14 )
    Scanned 10,100,000 docs (batch time: 0:06:55 / total time: 5:50:10 )
    Scanned 10,200,000 docs (batch time: 0:06:48 / total time: 5:56:58 )
    Scanned 10,300,000 docs (batch time: 0:06:55 / total time: 6:03:54 )
    Scanned 10,400,000 docs (batch time: 0:06:45 / total time: 6:10:40 )
    Scanned 10,500,000 docs (batch time: 0:06:57 / total time: 6:17:37 )
    Scanned 10,600,000 docs (batch time: 0:07:04 / total time: 6:24:42 )
    Scanned 10,700,000 docs (batch time: 0:07:09 / total time: 6:31:52 )
    Scanned 10,800,000 docs (batch time: 0:07:11 / total time: 6:39:03 )
    Scanned 10,900,000 docs (batch time: 0:07:22 / total time: 6:46:25 )
    Scanned 11,000,000 docs (batch time: 0:07:38 / total time: 6:54:03 )
    Scanned 11,100,000 docs (batch time: 0:07:15 / total time: 7:01:19 )
    Scanned 11,200,000 docs (batch time: 0:07:27 / total time: 7:08:46 )
    Scanned 11,300,000 docs (batch time: 0:07:24 / total time: 7:16:11 )
    Scanned 11,400,000 docs (batch time: 0:07:32 / total time: 7:23:44 )
    Scanned 11,500,000 docs (batch time: 0:07:25 / total time: 7:31:09 )
    Scanned 11,600,000 docs (batch time: 0:07:31 / total time: 7:38:40 )
    Scanned 11,700,000 docs (batch time: 0:07:34 / total time: 7:46:15 )
    Scanned 11,800,000 docs (batch time: 0:07:48 / total time: 7:54:04 )
    Scanned 11,900,000 docs (batch time: 0:07:59 / total time: 8:02:04 )
    Scanned 12,000,000 docs (batch time: 0:07:44 / total time: 8:09:49 )
    Scanned 12,100,000 docs (batch time: 0:07:47 / total time: 8:17:36 )
    Scanned 12,200,000 docs (batch time: 0:08:05 / total time: 8:25:41 )
    Scanned 12,300,000 docs (batch time: 0:08:21 / total time: 8:34:02 )
    Scanned 12,400,000 docs (batch time: 0:08:06 / total time: 8:42:09 )
    Scanned 12,500,000 docs (batch time: 0:08:30 / total time: 8:50:39 )
    Scanned 12,600,000 docs (batch time: 0:08:07 / total time: 8:58:46 )
    Scanned 12,700,000 docs (batch time: 0:08:40 / total time: 9:07:26 )
    Scanned 12,800,000 docs (batch time: 0:09:11 / total time: 9:16:38 )
    Scanned 12,900,000 docs (batch time: 0:08:31 / total time: 9:25:09 )
    Scanned 13,000,000 docs (batch time: 0:08:28 / total time: 9:33:38 )
    Scanned 13,100,000 docs (batch time: 0:08:48 / total time: 9:42:26 )
    Scanned 13,200,000 docs (batch time: 0:09:04 / total time: 9:51:30 )
    Scanned 13,300,000 docs (batch time: 0:08:34 / total time: 10:00:05 )
    Scanned 13,400,000 docs (batch time: 0:09:04 / total time: 10:09:09 )
    Scanned 13,500,000 docs (batch time: 0:08:47 / total time: 10:17:57 )
    Scanned 13,600,000 docs (batch time: 0:08:51 / total time: 10:26:48 )
    Scanned 13,700,000 docs (batch time: 0:08:48 / total time: 10:35:37 )
    Scanned 13,800,000 docs (batch time: 0:09:02 / total time: 10:44:39 )
    Scanned 13,900,000 docs (batch time: 0:09:23 / total time: 10:54:03 )
    Scanned 14,000,000 docs (batch time: 0:09:22 / total time: 11:03:26 )
    Scanned 14,100,000 docs (batch time: 0:09:22 / total time: 11:12:49 )
    Scanned 14,200,000 docs (batch time: 0:09:31 / total time: 11:22:20 )
    Scanned 14,300,000 docs (batch time: 0:09:14 / total time: 11:31:35 )
    Scanned 14,400,000 docs (batch time: 0:09:33 / total time: 11:41:08 )
    Scanned 14,500,000 docs (batch time: 0:09:45 / total time: 11:50:54 )
    Scanned 14,600,000 docs (batch time: 0:09:46 / total time: 12:00:40 )
    Scanned 14,700,000 docs (batch time: 0:09:29 / total time: 12:10:10 )
    Scanned 14,800,000 docs (batch time: 0:09:43 / total time: 12:19:54 )
    Scanned 14,900,000 docs (batch time: 0:09:43 / total time: 12:29:37 )
    Scanned 15,000,000 docs (batch time: 0:09:40 / total time: 12:39:17 )
    Scanned 15,100,000 docs (batch time: 0:10:09 / total time: 12:49:27 )
    Scanned 15,200,000 docs (batch time: 0:10:10 / total time: 12:59:37 )
    Scanned 15,300,000 docs (batch time: 0:10:06 / total time: 13:09:44 )

As you can see, at the end it was taking 10+ minutes to iterate over a batch of 100k documents. I don't even want to know how long it would take at the 27 millionth document batch.

Our cluster is at AWS and is comprised of the following:

  • (5) m4.xlarge data nodes
  • (3) m3.medium master nodes
  • (1) m4.large client node

That's 20 CPUs and 80GB RAM amongst the data nodes, which you think would be well enough for a 7GB index.

We experimented with various scan values (5s, 30s, 5m, 10m) and it didn't seem to make a difference. I enabled slow logging during the scan, but nothing appeared in the logs. I did grab a hot_threads output during the scan:

::: [estest][PiocenKcQCuuF-82oB7EZw][estest.domain.com][inet[/172.31.9.171:9300]]{master=true}
   Hot threads at 2016-05-02T20:23:39.940Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   31.4% (156.9ms out of 500ms) cpu usage by thread 'elasticsearch[estest][search][T#3]'
     2/10 snapshots sharing following 15 elements
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
       org.elasticsearch.search.scan.ScanContext.execute(ScanContext.java:52)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:120)
       org.elasticsearch.search.SearchService.executeScan(SearchService.java:274)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:520)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:517)
       org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     4/10 snapshots sharing following 19 elements
       org.apache.lucene.search.BitsFilteredDocIdSet.match(BitsFilteredDocIdSet.java:60)
       org.apache.lucene.search.FilteredDocIdSet$2.match(FilteredDocIdSet.java:103)
       org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)
       org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:257)
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
       org.elasticsearch.search.scan.ScanContext.execute(ScanContext.java:52)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:120)
       org.elasticsearch.search.SearchService.executeScan(SearchService.java:274)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:520)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:517)
       org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     2/10 snapshots sharing following 10 elements
       sun.misc.Unsafe.park(Native Method)
       java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
       java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
       java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
       java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1137)
       org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
       java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     2/10 snapshots sharing following 17 elements
       org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:59)
       org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:257)
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
       org.elasticsearch.search.scan.ScanContext.execute(ScanContext.java:52)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:120)
       org.elasticsearch.search.SearchService.executeScan(SearchService.java:274)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:520)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:517)
       org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)

   27.8% (139ms out of 500ms) cpu usage by thread 'elasticsearch[estest][search][T#4]'
     6/10 snapshots sharing following 19 elements
       org.apache.lucene.search.BitsFilteredDocIdSet.match(BitsFilteredDocIdSet.java:60)
       org.apache.lucene.search.FilteredDocIdSet$2.match(FilteredDocIdSet.java:103)
       org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)
       org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:257)
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
       org.elasticsearch.search.scan.ScanContext.execute(ScanContext.java:52)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:120)
       org.elasticsearch.search.SearchService.executeScan(SearchService.java:274)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:520)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:517)
       org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     4/10 snapshots sharing following 2 elements
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)

   27.7% (138.2ms out of 500ms) cpu usage by thread 'elasticsearch[estest][search][T#1]'
     3/10 snapshots sharing following 19 elements
       org.apache.lucene.search.BitsFilteredDocIdSet.match(BitsFilteredDocIdSet.java:60)
       org.apache.lucene.search.FilteredDocIdSet$2.match(FilteredDocIdSet.java:103)
       org.apache.lucene.search.FilteredDocIdSetIterator.nextDoc(FilteredDocIdSetIterator.java:60)
       org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:257)
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
       org.elasticsearch.search.scan.ScanContext.execute(ScanContext.java:52)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:120)
       org.elasticsearch.search.SearchService.executeScan(SearchService.java:274)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:520)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:517)
       org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     2/10 snapshots sharing following 16 elements
       org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.nextDoc(ConstantScoreQuery.java:257)
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
       org.elasticsearch.search.scan.ScanContext.execute(ScanContext.java:52)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:120)
       org.elasticsearch.search.SearchService.executeScan(SearchService.java:274)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:520)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:517)
       org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     2/10 snapshots sharing following 15 elements
       org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:193)
       org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
       org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
       org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
       org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
       org.elasticsearch.search.scan.ScanContext.execute(ScanContext.java:52)
       org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:120)
       org.elasticsearch.search.SearchService.executeScan(SearchService.java:274)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:520)
       org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:517)
       org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
     3/10 snapshots sharing following 10 elements
       sun.misc.Unsafe.park(Native Method)
       java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
       java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
       java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:644)
       java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1137)
       org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
       java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)

Steps to reproduce:

  1. Scan over all documents in a large index. Here's my scan code:
from elasticsearch import Elasticsearch, exceptions
from elasticsearch import helpers as es_helpers

docs = es_helpers.scan(
    sourceCluster,
    index=args.sourceIndex,
    scroll=args.scroll,
    fields=('_source', '_parent', '_routing', '_timestamp')
)

totalCount = 0
batchCount = 0
batchSize = 100000
batchTime = time()

for d in docs:
    batchCount += 1
    totalCount += 1
    if batchCount == batchSize:
        print "Scanned", format(totalCount, ",d"), "docs (batch time:", _timeDiff(batchTime), \
            "/ total time:", _timeDiff(start), ")"
        batchCount = 0
        batchTime = time()

def _timeDiff(start):
    m, s = divmod(time() - start, 60)
    h, m = divmod(m, 60)
    return "%d:%02d:%02d" % (h, m, s)

Expected behavior: Each batch of results from a scan call should return in the same amount of time.

Actual behavior: Each batch of results from a scan call gets slower and slower the deeper into the results it goes.

Is this a bug in scan/scroll? From everything I've read, scan is supposed to make diving deep into large result sets fast. It doesn't appear to be working here. We are currently stuck on 1.7 until we can get past this, so any help would be appreciated. We really want to move to 2.X ASAP. Thanks!

(Note: I had originally filed a bug in elasticsearch-py, but came to the conclusion that the issue is in ES core rather than the Python client. See original bug for more details and discussion:https://github.com/elastic/elasticsearch-py/issues/397)

:SearcSearch feedback_needed

Most helpful comment

EUREKA! Simply passing in a size=1000 kwarg into scan() has given me results similar to those posted by @clintongormley above:

    ...
    Scanned 6,700,000 docs (batch time: 0:00:08 / total time: 0:12:52 )
    Scanned 6,800,000 docs (batch time: 0:00:08 / total time: 0:13:00 )
    Scanned 6,900,000 docs (batch time: 0:00:08 / total time: 0:13:09 )
    Scanned 7,000,000 docs (batch time: 0:00:08 / total time: 0:13:17 )
    Scanned 7,100,000 docs (batch time: 0:00:08 / total time: 0:13:26 )
    Scanned 7,200,000 docs (batch time: 0:00:08 / total time: 0:13:34 )
    Scanned 7,300,000 docs (batch time: 0:00:08 / total time: 0:13:43 )
    Scanned 7,400,000 docs (batch time: 0:00:08 / total time: 0:13:52 )
    Scanned 7,500,000 docs (batch time: 0:00:08 / total time: 0:14:00 )
    Scanned 7,600,000 docs (batch time: 0:00:08 / total time: 0:14:09 )
    Scanned 7,700,000 docs (batch time: 0:00:08 / total time: 0:14:18 )
    Scanned 7,800,000 docs (batch time: 0:00:08 / total time: 0:14:27 )
    Scanned 7,900,000 docs (batch time: 0:00:08 / total time: 0:14:36 )
    Scanned 8,000,000 docs (batch time: 0:00:08 / total time: 0:14:44 )
    Scanned 8,100,000 docs (batch time: 0:00:09 / total time: 0:14:53 )
    Scanned 8,200,000 docs (batch time: 0:00:08 / total time: 0:15:02 )
    Scanned 8,300,000 docs (batch time: 0:00:09 / total time: 0:15:12 )
    Scanned 8,400,000 docs (batch time: 0:00:09 / total time: 0:15:21 )
    Scanned 8,500,000 docs (batch time: 0:00:09 / total time: 0:15:30 )
    Scanned 8,600,000 docs (batch time: 0:00:09 / total time: 0:15:39 )
    Scanned 8,700,000 docs (batch time: 0:00:09 / total time: 0:15:48 )
    Scanned 8,800,000 docs (batch time: 0:00:09 / total time: 0:15:58 )
    Scanned 8,900,000 docs (batch time: 0:00:09 / total time: 0:16:07 )
    Scanned 9,000,000 docs (batch time: 0:00:09 / total time: 0:16:17 )
    Scanned 9,100,000 docs (batch time: 0:00:09 / total time: 0:16:26 )
    Scanned 9,200,000 docs (batch time: 0:00:09 / total time: 0:16:36 )
    Scanned 9,300,000 docs (batch time: 0:00:09 / total time: 0:16:45 )
    Scanned 9,400,000 docs (batch time: 0:00:09 / total time: 0:16:55 )
    Scanned 9,500,000 docs (batch time: 0:00:09 / total time: 0:17:04 )
    Scanned 9,600,000 docs (batch time: 0:00:09 / total time: 0:17:14 )
    Scanned 9,700,000 docs (batch time: 0:00:09 / total time: 0:17:24 )
    Scanned 9,800,000 docs (batch time: 0:00:09 / total time: 0:17:33 )
    Scanned 9,900,000 docs (batch time: 0:00:09 / total time: 0:17:43 )
    ...

What previously took 6 hours now takes about 18 minutes!

After pulling my hair out for over a week, it seems insane that elasticsearch-py uses a default size=10 when calling scan(). We were effectively DDoS'ing ES with 100x the number of queries than if we had passed in size=1000. A default of 10 makes sense for search() but presumably when calling scan() you are iterating over a relatively large set. I will close this ticket and open a new one at elasticsearch-py to discuss upping the default size for scan() or at least updating the documentation to explain why using a much larger size would be beneficial, especially when calling reindex().

Thanks ES team for getting us past this bump... onward to 2.X!

All 7 comments

Did you try with a smaller batch size ? 100k seems big and could potentially use a lot of memory on the es side. What happens with a batch size of 1k ? You should also check the GC logs to see if your nodes have enough memory to handle a scan with big batches.

Do you realise that you're asking for 100,000 results X number of primary shards? That's huge. I'd reduce this to eg 5000 docs and remeasure.

Also, the scroll (you said scan but i think you meant scroll) needs to be long enough for you to process your docs and come back for the next tranche.

To clarify, the batch size in my benchmark code is not the batch size for scan it's simply how often I print out the timer value (every 100k docs). I am simply calling scan (https://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan) with default values, and iterating over the results. (see Steps to reproduce in my original post).

Regarding the scroll value, it's plenty long, as I'm not doing any processing of the results, simply iterating over them. In any case, I've tried multiple values there and it has no affect.

@jsnod sorry for the confusion. It seems that it's the opposite, the scan implementation in elasticsearch-py uses a ridiculously small batch size: 10. I might be wrong:
https://github.com/elastic/elasticsearch-py/blob/6e94bf76adbbbb2fe5b927a3933ed535cd738387/elasticsearch/helpers/__init__.py
@HonzaKral can you confirm ?

@jsnod I've tried this out locally in 1.7.3. I indexed 2 million empty docs then scrolled through them (with search_type=scan).

With size 10, these are the times each batch took in seconds:

100000     9.558165
200000     9.448229
300000     9.986946
400000     10.793542
500000     11.296383
600000     12.203433
700000     12.712313
800000     13.569650
900000     14.012085
1000000    14.735239
1100000    15.399333
1200000    16.301194
1300000    16.838166
1400000    17.399276
1500000    18.171454
1600000    18.972609
1700000    19.450955
1800000    20.162497
1900000    21.228723
2000000    20.770031

With size 1000 (ie num_shards * 1000), I got a much flatter graph:

100000     4.770898
200000     4.679908
300000     4.660660
400000     4.612432
500000     4.797696
600000     4.670094
700000     4.723260
800000     4.754570
900000     4.747548
1000000    4.748295
1100000    4.680469
1200000    4.788206
1300000    4.975887
1400000    4.822974
1500000    4.824638
1600000    4.745880
1700000    4.884137
1800000    4.884558
1900000    4.747686
2000000    4.783460

The best value for size depends on hardware, doc size etc. I tried 5000 too and got similar results to size 1000.

So I'd recommend setting size to about 1000 (or at least experimenting with values around there). Failing that, you can break your docs down into tranches (eg a filter on created_date) and run several reindexing jobs.

@jimferenczi The elasticsearch-py client doesn't set any default value for the size so it uses the defaults. You can always raise it by passing in a larger number as argument.

EUREKA! Simply passing in a size=1000 kwarg into scan() has given me results similar to those posted by @clintongormley above:

    ...
    Scanned 6,700,000 docs (batch time: 0:00:08 / total time: 0:12:52 )
    Scanned 6,800,000 docs (batch time: 0:00:08 / total time: 0:13:00 )
    Scanned 6,900,000 docs (batch time: 0:00:08 / total time: 0:13:09 )
    Scanned 7,000,000 docs (batch time: 0:00:08 / total time: 0:13:17 )
    Scanned 7,100,000 docs (batch time: 0:00:08 / total time: 0:13:26 )
    Scanned 7,200,000 docs (batch time: 0:00:08 / total time: 0:13:34 )
    Scanned 7,300,000 docs (batch time: 0:00:08 / total time: 0:13:43 )
    Scanned 7,400,000 docs (batch time: 0:00:08 / total time: 0:13:52 )
    Scanned 7,500,000 docs (batch time: 0:00:08 / total time: 0:14:00 )
    Scanned 7,600,000 docs (batch time: 0:00:08 / total time: 0:14:09 )
    Scanned 7,700,000 docs (batch time: 0:00:08 / total time: 0:14:18 )
    Scanned 7,800,000 docs (batch time: 0:00:08 / total time: 0:14:27 )
    Scanned 7,900,000 docs (batch time: 0:00:08 / total time: 0:14:36 )
    Scanned 8,000,000 docs (batch time: 0:00:08 / total time: 0:14:44 )
    Scanned 8,100,000 docs (batch time: 0:00:09 / total time: 0:14:53 )
    Scanned 8,200,000 docs (batch time: 0:00:08 / total time: 0:15:02 )
    Scanned 8,300,000 docs (batch time: 0:00:09 / total time: 0:15:12 )
    Scanned 8,400,000 docs (batch time: 0:00:09 / total time: 0:15:21 )
    Scanned 8,500,000 docs (batch time: 0:00:09 / total time: 0:15:30 )
    Scanned 8,600,000 docs (batch time: 0:00:09 / total time: 0:15:39 )
    Scanned 8,700,000 docs (batch time: 0:00:09 / total time: 0:15:48 )
    Scanned 8,800,000 docs (batch time: 0:00:09 / total time: 0:15:58 )
    Scanned 8,900,000 docs (batch time: 0:00:09 / total time: 0:16:07 )
    Scanned 9,000,000 docs (batch time: 0:00:09 / total time: 0:16:17 )
    Scanned 9,100,000 docs (batch time: 0:00:09 / total time: 0:16:26 )
    Scanned 9,200,000 docs (batch time: 0:00:09 / total time: 0:16:36 )
    Scanned 9,300,000 docs (batch time: 0:00:09 / total time: 0:16:45 )
    Scanned 9,400,000 docs (batch time: 0:00:09 / total time: 0:16:55 )
    Scanned 9,500,000 docs (batch time: 0:00:09 / total time: 0:17:04 )
    Scanned 9,600,000 docs (batch time: 0:00:09 / total time: 0:17:14 )
    Scanned 9,700,000 docs (batch time: 0:00:09 / total time: 0:17:24 )
    Scanned 9,800,000 docs (batch time: 0:00:09 / total time: 0:17:33 )
    Scanned 9,900,000 docs (batch time: 0:00:09 / total time: 0:17:43 )
    ...

What previously took 6 hours now takes about 18 minutes!

After pulling my hair out for over a week, it seems insane that elasticsearch-py uses a default size=10 when calling scan(). We were effectively DDoS'ing ES with 100x the number of queries than if we had passed in size=1000. A default of 10 makes sense for search() but presumably when calling scan() you are iterating over a relatively large set. I will close this ticket and open a new one at elasticsearch-py to discuss upping the default size for scan() or at least updating the documentation to explain why using a much larger size would be beneficial, especially when calling reindex().

Thanks ES team for getting us past this bump... onward to 2.X!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matthughes picture matthughes  路  3Comments

jasontedor picture jasontedor  路  3Comments

abtpst picture abtpst  路  3Comments

clintongormley picture clintongormley  路  3Comments

abrahamduran picture abrahamduran  路  3Comments