Elasticsearch: Improve test coverage for bucket and metric aggregations

Created on 20 Dec 2016 · 3Comments · Source: elastic/elasticsearch

Today there is very little unit test coverage for bucket and metric aggregations. This meta issue aim is to significantly improve that. For each aggregation we should add more unit tests for the aggregator (how it interacts with the Lucene index via org.apache.lucene.search.Collector that each aggregation implements), the reduce logic and serialization of the aggregation results.

We should add unit tests for the following Aggregator implementations:

[x] ParentToChildrenAggregator @cbuescher #23305
[x] FilterAggregator @colings86 #23826
[x] FiltersAggregator @jimczi #22678
[x] GeoHashGridAggregator @martijnvg #23417
[x] GlobalAggregator @nik9000 #22668
[x] DateHistogramAggregator @tlrx #22714
[x] HistogramAggregator @jpountz #22961
[x] MissingAggregator @jimczi #23895
[x] NestedAggregator @polyfractal
[x] ReverseNestedAggregator @cbuescher
[x] RangeAggregator (also test with DateRangeAggregationBuilder and GeoDistanceAggregationBuilder) @tlrx #24569
[x] BinaryRangeAggregator (also test with IpRangeAggregationBuilder) @jimczi #23255
[x] Diversified sampler aggregator, which has the following implementations that need to be tested: DiversifiedBytesHashSamplerAggregator, DiversifiedMapSamplerAggregator, DiversifiedNumericSamplerAggregator and DiversifiedOrdinalsSamplerAggregator. @martijnvg #23511
[x] SamplerAggregator @nik9000 #23243
[x] Significant terms aggregation: GlobalOrdinalsSignificantTermsAggregator, GlobalOrdinalsSignificantTermsAggregator.WithHash, SignificantLongTermsAggregator and SignificantStringTermsAggregator. @markharwood #24904
[x] Terms aggregation: DoubleTermsAggregator, GlobalOrdinalsStringTermsAggregator. LowCardinality, GlobalOrdinalsStringTermsAggregator. WithHash, LongTermsAggregator and StringTermsAggregator. @martijnvg #24949
[x] BestBucketsDeferringCollector @MaineC #23511
[x] BestDocsDeferringCollector @polyfractal #23511
[x] AvgAggregator @cbuescher #23000
[x] CardinalityAggregator @colings86 #23826
[x] GeoBoundsAggregator @jimczi #23259
[x] GeoCentroidAggregator @martijnvg #24111
[x] MaxAggregator @nik9000 #22668
[x] MinAggregator [MvG] #22279
[x] Percentiles metric aggregation: TDigestPercentilesAggregator and HDRPercentilesAggregator @tlrx #24245
[x] Percentiles rank metric aggregation: TDigestPercentileRanksAggregator and HDRPercentileRanksAggregator. @jpountz #23240
[x] ScriptedMetricAggregator @cbuescher #23404
[x] StatsAggregator @jimczi
[x] ExtendedStatsAggregator @jimczi
[x] SumAggregator @martijnvg #22954
[x] TopHitsAggregator @nik9000 #22754
[x] ValueCountAggregator @tlrx #22741
[x] MatrixStatsAggregator @martijnvg #24837

We should also add tests for the following InternalAggregation implementations:
(to test the reduce and result serialization logic)

[x] InternalChildren @cbuescher #23261
[x] InternalFilter @colings86 #23388
[x] InternalFilters @jimczi #22678
[x] InternalGeoHashGrid @martijnvg #23417
[x] InternalGlobal @nik9000 #23388
[x] InternalHistogram @jpountz #22961
[x] InternalDateHistogram @tlrx #23402
[x] InternalMissing @MaineC #23388
[x] InternalNested @polyfractal #23388
[x] InternalReverseNested @cbuescher #23388
[x] InternalRange, InternalDateRange, InternalGeoDistance. @tlrx #24569
[x] InternalBinaryRange @jimczi #23259
[x] InternalSampler @martijnvg edada2581e75400da9fac82bdfbc7ec1f02ef0d8
[x] Significant terms aggregation: SignificantLongTerms and SignificantStringTerms. @tlrx #23428
[x] Terms aggregation: DoubleTerms, LongTerms, StringTerms and UnmappedTerms. @jpountz #23149
[x] InternalAvg @cbuescher #23000
[x] InternalCardinality @colings86 #23826
[x] InternalGeoBounds @jimczi #23259
[x] InternalGeoCentroid @martijnvg #24176
[x] InternalMax @nik9000 #22668
[x] InternalMin @colings86 + @nik9000 #22668
[x] Percentiles metric aggregation: InternalTDigestPercentiles (#24090) and InternalHDRPercentiles (#24157) @tlrx
[x] Percentiles rank metric aggregation: InternalTDigestPercentileRanks and InternalHDRPercentileRanks. @jpountz #23240
[x] InternalScriptedMetric @cbuescher #23330
[x] InternalStats @jimczi
[x] InternalExtendedStats @jimczi
[x] InternalSum @martijnvg #22954
[x] InternalTopHits @nik9000
[x] InternalValueCount @tlrx #22741
[x] InternalMatrixStats @martijnvg #24559

I may have forgotten some classes, so please update this issue if that is the case :)

I listed the InternalAggregation implementations separately from Aggregator implementations as unit tests for each can be written in parallel by different devs. However I think for the less complex aggregations unit tests for both the InternalAggregation implementation and Aggregator implementations can be added in the same PR.

I think at least the unit tests should be added to the master branch. Backporting to 5.x branch is best effort and should only be considered if it is low hanging fruit.

I suggest that we work in the following way in order to avoid accidentally doing work twice:

Add your name the a task you like to work on before starting to write the unit tests.
When you open a pr then add the PR number to task.
Once the pr is merged, then check off the task.

Note that aggregations are tested, however due to how before the code was structured, unit testing was really difficult.

:AnalyticAggregations >test Meta

Source