Trying to subset my collection of JATS article metadata leads to a response in 20/40 seconds (eXist-db 4.7.1, resp. Linux/Windows), whereas the non-subset collection results are available in fractions of a second.
a subset to process (almost) as quickly as the original set.
Describe how we can can reproduce the problem.
With the collection posted here, referenced as '/db/meta' in the example below:
let $contrib as element(contrib)+ := collection( '/db/meta' )//contrib[1]
return
count($contrib)
[1] the query is much faster (but it returns all contributors, which is not what I want!).Please always add the following information
I can confirm that the same performance variation exists on 5.3.0-snapshot 55e77cc52f8555dda5ed058ef98f283d69a72a8e
@adamretter the link I posted in the description above is lo longer working because of the Slack message limit. However, the collection I am referring to is available to you on our test server at /db/data/shared.benjamins.com/meta
On eXist-db 5.3.0-SNAPSHOT with a single 257 MB PubMed document I can confirm that a query like:
let $contrib as element(LastName)+ := doc( '/db/pubmed_result-257m.xml' )//LastName[1]
return
count($contrib)
Executes in ~500ms and returns a count of 123,226 elements.
Whereas the query without the [1] predicate:
let $contrib as element(LastName)+ := doc( '/db/pubmed_result-257m.xml' )//LastName
return
count($contrib)
Executes in ~25ms and returns the same count of 123,226 elements.
I stepped through the execution with the Java debugger and found that...
For the query without the [1] predicate, eXist-db does a single call to findDescendantsByTagName against the Structural Index, matching all Elements that have the QName LastName.
For the version with the predicate, the same as the above happens, but then the predicate has to be applied to the results from the Structural Index. To filter by the predicate, for the set of elements, a new set of all parent nodes is created. For each parent, the set of elements from the structural index is then filtered to find descendants of the parent, the resultant set then. The resultant set is then filtered by the first element (the [1] predicate) in that set. Unfortunately the approach taken here is rather complex and could likely be improved, but it is not a simple matter.
As the query without the [1] predicate can be answered directly from the Structural Index, it is much faster!
@adamretter based on your explanations I have changed my query as follows:
let $contrib-group as element(contrib-group)+ := collection( '/db/data/shared.benjamins.com/meta' )//contrib-group
let $contrib as element(contrib)+ := $contrib-group ! ./contrib[1]
return
count($contrib)
and this returns a result in just over a second, which is absolutely fine.
Do note that the following does not perform well at all (have not yet got a result):
let $contrib as element(contrib)+ := $contrib-group/contrib[1]
I am not sure whether it should return the same result though.
It did return the same result:

I will close the issue as I now think it is more of a feature than it is a bug, given Adam's explanations. I think we would need to add a few lines of documentation on this topic, stating that querying //node() is not always a good idea, esp when you need node's parent.
Most helpful comment
I stepped through the execution with the Java debugger and found that...
For the query without the
[1]predicate, eXist-db does a single call tofindDescendantsByTagNameagainst the Structural Index, matching all Elements that have the QNameLastName.For the version with the predicate, the same as the above happens, but then the predicate has to be applied to the results from the Structural Index. To filter by the predicate, for the set of elements, a new set of all parent nodes is created. For each parent, the set of elements from the structural index is then filtered to find descendants of the parent, the resultant set then. The resultant set is then filtered by the first element (the
[1]predicate) in that set. Unfortunately the approach taken here is rather complex and could likely be improved, but it is not a simple matter.As the query without the
[1]predicate can be answered directly from the Structural Index, it is much faster!