Arangodb: AQL optimizer does not pick up multiple Geo index

Created on 14 Feb 2019 · 3Comments · Source: arangodb/arangodb

Hello, I'm facing a subtle issue with AQL and Geo indices. It seems the AQL query optimiser uses the Geo index once, then falls back to a full collection scan.

My Environment

__ArangoDB Version__: 3.4.2
__Storage Engine__: RocksDB
__Deployment Mode__: Single Server
__Deployment Strategy__: ArangoDB Docker image (arangodb/arangodb:3.4.2)
__Configuration__: default config
__Infrastructure__: own
__Operating System__: Ubuntu 18.04
__Total RAM in your machine__: 16GB
__Disks in use__: SSD
__Used Package__: Docker - official Docker library

Component, Query & Data

__Affected feature__:
AQL query using web interface

__AQL query (if applicable)__:

LET col1 = (
    FOR h in col1
    FILTER GEO_DISTANCE(h.location, GEO_POINT(@lng, @lat)) <= 200
    RETURN h
)
LET col2 = (
    FOR e in col2
    FILTER GEO_DISTANCE(e.location, GEO_POINT(@lng, @lat)) <= 200
    RETURN e
)
RETURN { col1, col2 }

__AQL explain (if applicable)__:

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  7   SubqueryNode                 1     - LET col1 = ...   /* const subquery */
  2   SingletonNode                1       * ROOT
 16   IndexNode                    5         - FOR h IN col1   /* geo index scan */
  6   ReturnNode                   5           - RETURN h
 13   SubqueryNode                 1     - LET col2 = ...   /* const subquery */
  8   SingletonNode                1       * ROOT
  9   EnumerateCollectionNode      3         - FOR e IN col2   /* full collection scan */
 10   CalculationNode              3           - LET #10 = (GEO_DISTANCE(e.`location`, { "type" : "Point", "coordinates" : [ 4.857288, 50.465768 ] }) <= 200)   /* simple expression */   /* collections used: e : col2 */
 11   FilterNode                   3           - FILTER #10
 12   ReturnNode                   3           - RETURN e
 14   CalculationNode              1     - LET #12 = { "col1" : col1, "col2" : col2 }   /* simple expression */
 15   ReturnNode                   1     - RETURN #12

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields           Ranges
 16   geo    col1      false    true             n/a   [ `location` ]   (GEO_DISTANCE({ "type" : "Point", "coordinates" : [ 4.857288, 50.465768 ] }, h.`location`) <= 200)

Functions used:
 Name           Deterministic   Cacheable   Uses V8
 GEO_DISTANCE   true            true        false  

Optimization rules applied:
 Id   RuleName
  1   geo-index-optimizer
  2   remove-unnecessary-calculations-2

__Size of your Dataset on disk__:
A couple megabytes

Steps to reproduce

Create Geo indices on col1 and col2 ("location" field)
Run query in web interface
Profile query

__Problem__:
The Geo index of col2 is not used

You can switch the query around like this

LET col2 = (
    FOR h in col2
    FILTER GEO_DISTANCE(h.location, GEO_POINT(@lng, @lat)) <= 200
    RETURN h
)
LET col1 = (
    FOR e in col1
    FILTER GEO_DISTANCE(e.location, GEO_POINT(@lng, @lat)) <= 200
    RETURN e
)
RETURN { col1, col2 }

and it will use the Geo index for col2 then fall back to full collection scan for col1.

__Expected result__:
The Geo index is used for col1 and col2

I've made sure both Geo indices are set up and work for a simple query (like below)

FOR h in col2
FILTER GEO_DISTANCE(h.location, GEO_POINT(@lng, @lat)) <= 200
RETURN h

1 Bug 2 Fixed 3 AQL 3 Geo 3 Index

Source

Levalis

All 3 comments

Will be fixed in 3.4.4

jsteemann on 19 Feb 2019

👍1

Thanks for the update.

Levalis on 19 Feb 2019

Hi @Levalis,

Version 3.4.4, which includes a fix for this issue, has been released.
The Geo optimization works correctly, a Geo index scan for both collections is used.

Query String:
 LET a = (
     FOR h in col1
     FILTER GEO_DISTANCE(h.location, [0, 0]) <= 200
     RETURN h
 )
 LET b = (
     FOR e in col2
     FILTER GEO_DISTANCE(e.location, [0, 0]) <= 200
     RETURN e
 )
 RETURN { a, b }

Execution plan:
 Id   NodeType          Calls   Items   Runtime [s]   Comment
  1   SingletonNode         1       1       0.00000   * ROOT
  7   SubqueryNode          1       1       0.00014     - LET a = ...   /* const subquery */
  2   SingletonNode         1       1       0.00000       * ROOT
 16   IndexNode             1       0       0.00013         - FOR h IN col1   /* geo index scan */
  6   ReturnNode            1       0       0.00000           - RETURN h
 13   SubqueryNode          1       1       0.00008     - LET b = ...   /* const subquery */
  8   SingletonNode         1       1       0.00000       * ROOT
 17   IndexNode             1       0       0.00007         - FOR e IN col2   /* geo index scan */
 12   ReturnNode            1       0       0.00000           - RETURN e
 14   CalculationNode       1       1       0.00001     - LET #12 = { "a" : a, "b" : b }   /* simple expression */
 15   ReturnNode            1       1       0.00000     - RETURN #12

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields           Ranges
 16   geo    col1         false    true             n/a   [ `location` ]   (GEO_DISTANCE([ 0, 0 ], h.`location`) <= 200)
 17   geo    col2         false    true             n/a   [ `location` ]   (GEO_DISTANCE([ 0, 0 ], e.`location`) <= 200)

Optimization rules applied:
 Id   RuleName
  1   geo-index-optimizer
  2   remove-unnecessary-calculations-2

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Exec Time [s]
           0            0           0            0          0         0.00130

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00000
 parsing                    0.00013
 optimizing ast             0.00002
 loading collections        0.00001
 instantiating plan         0.00007
 optimizing plan            0.00032
 executing                  0.00026
 finalizing                 0.00047