Hello, I'm facing a subtle issue with AQL and Geo indices. It seems the AQL query optimiser uses the Geo index once, then falls back to a full collection scan.
__Affected feature__:
AQL query using web interface
__AQL query (if applicable)__:
LET col1 = (
FOR h in col1
FILTER GEO_DISTANCE(h.location, GEO_POINT(@lng, @lat)) <= 200
RETURN h
)
LET col2 = (
FOR e in col2
FILTER GEO_DISTANCE(e.location, GEO_POINT(@lng, @lat)) <= 200
RETURN e
)
RETURN { col1, col2 }
__AQL explain (if applicable)__:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
7 SubqueryNode 1 - LET col1 = ... /* const subquery */
2 SingletonNode 1 * ROOT
16 IndexNode 5 - FOR h IN col1 /* geo index scan */
6 ReturnNode 5 - RETURN h
13 SubqueryNode 1 - LET col2 = ... /* const subquery */
8 SingletonNode 1 * ROOT
9 EnumerateCollectionNode 3 - FOR e IN col2 /* full collection scan */
10 CalculationNode 3 - LET #10 = (GEO_DISTANCE(e.`location`, { "type" : "Point", "coordinates" : [ 4.857288, 50.465768 ] }) <= 200) /* simple expression */ /* collections used: e : col2 */
11 FilterNode 3 - FILTER #10
12 ReturnNode 3 - RETURN e
14 CalculationNode 1 - LET #12 = { "col1" : col1, "col2" : col2 } /* simple expression */
15 ReturnNode 1 - RETURN #12
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
16 geo col1 false true n/a [ `location` ] (GEO_DISTANCE({ "type" : "Point", "coordinates" : [ 4.857288, 50.465768 ] }, h.`location`) <= 200)
Functions used:
Name Deterministic Cacheable Uses V8
GEO_DISTANCE true true false
Optimization rules applied:
Id RuleName
1 geo-index-optimizer
2 remove-unnecessary-calculations-2
__Size of your Dataset on disk__:
A couple megabytes
__Problem__:
The Geo index of col2 is not used
You can switch the query around like this
LET col2 = (
FOR h in col2
FILTER GEO_DISTANCE(h.location, GEO_POINT(@lng, @lat)) <= 200
RETURN h
)
LET col1 = (
FOR e in col1
FILTER GEO_DISTANCE(e.location, GEO_POINT(@lng, @lat)) <= 200
RETURN e
)
RETURN { col1, col2 }
and it will use the Geo index for col2 then fall back to full collection scan for col1.
__Expected result__:
The Geo index is used for col1 and col2
I've made sure both Geo indices are set up and work for a simple query (like below)
FOR h in col2
FILTER GEO_DISTANCE(h.location, GEO_POINT(@lng, @lat)) <= 200
RETURN h
Will be fixed in 3.4.4
Thanks for the update.
Hi @Levalis,
Version 3.4.4, which includes a fix for this issue, has been released.
The Geo optimization works correctly, a Geo index scan for both collections is used.
Query String:
LET a = (
FOR h in col1
FILTER GEO_DISTANCE(h.location, [0, 0]) <= 200
RETURN h
)
LET b = (
FOR e in col2
FILTER GEO_DISTANCE(e.location, [0, 0]) <= 200
RETURN e
)
RETURN { a, b }
Execution plan:
Id NodeType Calls Items Runtime [s] Comment
1 SingletonNode 1 1 0.00000 * ROOT
7 SubqueryNode 1 1 0.00014 - LET a = ... /* const subquery */
2 SingletonNode 1 1 0.00000 * ROOT
16 IndexNode 1 0 0.00013 - FOR h IN col1 /* geo index scan */
6 ReturnNode 1 0 0.00000 - RETURN h
13 SubqueryNode 1 1 0.00008 - LET b = ... /* const subquery */
8 SingletonNode 1 1 0.00000 * ROOT
17 IndexNode 1 0 0.00007 - FOR e IN col2 /* geo index scan */
12 ReturnNode 1 0 0.00000 - RETURN e
14 CalculationNode 1 1 0.00001 - LET #12 = { "a" : a, "b" : b } /* simple expression */
15 ReturnNode 1 1 0.00000 - RETURN #12
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
16 geo col1 false true n/a [ `location` ] (GEO_DISTANCE([ 0, 0 ], h.`location`) <= 200)
17 geo col2 false true n/a [ `location` ] (GEO_DISTANCE([ 0, 0 ], e.`location`) <= 200)
Optimization rules applied:
Id RuleName
1 geo-index-optimizer
2 remove-unnecessary-calculations-2
Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Filtered Exec Time [s]
0 0 0 0 0 0.00130
Query Profile:
Query Stage Duration [s]
initializing 0.00000
parsing 0.00013
optimizing ast 0.00002
loading collections 0.00001
instantiating plan 0.00007
optimizing plan 0.00032
executing 0.00026
finalizing 0.00047