Elasticsearch: AutoDateHistogramAggregatorTests is slow

Created on 27 Aug 2018 · 8Comments · Source: elastic/elasticsearch

This test class can occupy around 40% of :server:test time.

> Task :server:test
Slow Tests Summary:
 45.63s | org.elasticsearch.search.aggregations.bucket.histogram.AutoDateHistogramAggregatorTests

:AnalyticAggregations help wanted

Source

pcsanwald

All 8 comments

Pinging @elastic/es-search-aggs

elasticmachine on 27 Aug 2018

Can I work on this?

ekalgolas on 29 Aug 2018

@ekalgolas I had been planning to do it myself, but if you'd like to, you're welcome to it if you're still interested?

pcsanwald on 4 Sep 2018

Sure. I will investigate the slowness in the test code and mention my findings here. Is there anything specific you know that is responsible for it which you wanted to address here?

ekalgolas on 5 Sep 2018

I tried to do some refactor work to avoid re computations for date and made some common objects global. This only yields 10-15% performance improvement in best case scenario. This actual bottleneck are the testAll* test which try to test intervals with a huge dataset (~600) and assert on the search and reduce cases with 5-6 variations. Each one of those tests takes about 3-4 secs on my machine, and there are about 5 of those, and total execution time being 20-23 secs.

An easy fix would be to either reduce the dataset size or the number of variations, for which I need to know the importance of having these variations and the huge dataset size. If this is absolutely important (which does not seem like for every point in the dataset), we will have to figure out how to run these tests faster keeping these parameters same. Otherwise, we can either rewrite these tests to have the edge data points (with a few regular cases) or reduce the variations tested in a way that the functionality is still tested with less data and assertions

Thoughts @pcsanwald ?

ekalgolas on 10 Sep 2018

Hi @ekalgolas - I apologize for the delayed response on this. If you're still interested in working on this, I think the right approach is to:

1) move the unit tests to used randomized parameters (perhaps a randomized range would be good) as opposed to the exhaustive set of variations.
2) move the benchmarking part of this to use rally

If you're still interested (I again apologize for the delay in responding), and want to take a crack at the first thing, I can handle the benchmarking on the rally side.

CC @colings86 for an expert opinion on whether the above seems right :).

pcsanwald on 24 Oct 2018

👍1

Sure. I`ll get started and submit a pull request for the same. Thanks for the response

ekalgolas on 25 Oct 2018

🎉1

@pcsanwald Created a pull request. I was able to bring down the run-time by 60%

ekalgolas on 30 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings