I'm using the latest ArangoDB of the respective release series:
On this operating system:
I'm issuing AQL via:
I've run db._explain("<my aql query>") and it didn't shed more light on this.
The AQL query in question is:
FOR d IN vDevice
FOR r IN vRoute
FILTER d._id == r.vDeviceId
SORT r.mask DESC
LIMIT 50
RETURN {id:r._id, dev:d.hostname, network:r.prefix}
This query takes ~5.5s. vDevice has 60 docs and vRoute has ~1.5M docs. We have a hash index at vRoute table called vDeviceId. This is the execution plan of the query above:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
3 EnumerateCollectionNode 211604 - FOR r IN vRoute /* full collection scan */
6 CalculationNode 211604 - LET #4 = r.`mask` /* attribute expression */ /* collections used: r : vRoute */
11 IndexNode 211604 - FOR d IN vDevice /* primary index scan */
7 SortNode 211604 - SORT #4 DESC
8 LimitNode 50 - LIMIT 0, 50
9 CalculationNode 50 - LET #6 = { "id" : r.`_id`, "dev" : d.`hostname`, "network" : r.`prefix` } /* simple expression */ /* collections used: r : vRoute, d : vDevice */
10 ReturnNode 50 - RETURN #6
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
11 primary vDevice true false 100.00 % [ `_key` ] (d.`_id` == r.`vDeviceId`)
The following query takes ~1.5s. We have to add a useless FILTER (2nd line) to force arango to use our index and make the query to perform.
FOR d IN vDevice
FILTER LIKE (d.hostname,'%%')
FOR r IN vRoute
FILTER d._id == r.vDeviceId
SORT r.mask DESC
LIMIT 50
RETURN {id:r._id, dev:d.hostname, network:r.prefix}
The execution plan follows:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 417 - FOR d IN vDevice /* full collection scan */
3 CalculationNode 417 - LET #2 = LIKE(d.`hostname`, "%%") /* simple expression */ /* collections used: d : vDevice */
4 FilterNode 417 - FILTER #2
13 IndexNode 393648 - FOR r IN vRoute /* hash index scan */
8 CalculationNode 393648 - LET #6 = r.`mask` /* attribute expression */ /* collections used: r : vRoute */
9 SortNode 393648 - SORT #6 DESC
10 LimitNode 50 - LIMIT 0, 50
11 CalculationNode 50 - LET #8 = { "id" : r.`_id`, "dev" : d.`hostname`, "network" : r.`prefix`} /* simple expression */ /* collections used: r : vRoute, d : vDevice */
12 ReturnNode 50 - RETURN #8
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
13 hash vRoute false false 0.11 % [ `vDeviceId` ] (d.`_id` == r.`vDeviceId`)
The solution seems hacky though. What could be the reason for such behavior?
The issue can be reproduced using this dataset: _We can provide dump via email._
Please provide a way to create the dataset to run the above query on; either by a gist with an arangodump, or `db.collection.save({my: "values"}) statements. If it can be reproduced with one of the ArangoDB example datasets, it's a plus.
Hey team! I was wondering if you are thinking of use index <name> feature where we could specify which index to use directly in AQL.
Hi,
sorry for comming back late on this.
Let me explain the tactics of the optimizer here.
You can think of the AQL becoming a snail inside of arangodb, chewing the data at its mouth passing it down all the way to the RETURN-statement. It strives to Bite as less data with its mouth as possible, and move the parts of the query so the data that is being passed along to the back of the snake is reduced as early as possible.
So, usually a full collection scan is always a bad idea. For that reason it will always try to have indexed access on the outermost FOR statement. After that it will try to have the the join-condition being indexed.
Your workaround introduces a Calculation node, which the optimizer can't tell whether its ok to wrap the query around or not, so it mustn't swap the loops.
A more smarter way to tell the Optimizer how to handle this situation (if the vDevice collection doesn't have many documents) may look like this:
LET devices=(FOR device IN vDevice RETURN { device._id, device.hostname} )
FOR d IN devices
....
Then the optimizer should also pick your Index of choice.
Hi @gurisko,
both named indexes and the option "indexHint" to override the internal optimizer decision regarding which index to use are supported in Release Candidate 7 of ArangoDB 3.5.
For further details, please have a look at:
You can find the latest Release Candidate of 3.5 in the download section.
Most helpful comment
Hey team! I was wondering if you are thinking of
use index <name>feature where we could specify which index to use directly in AQL.