Exist: implement Lucene range queries

Created on 17 Aug 2018  路  9Comments  路  Source: eXist-db/exist

What is the problem

Start with a Lucene index on a field named "date" created with ft:index. Either of the following two query strings, when passed to ft:search, returns the correct results:

 let $query := "date:1600"
 let $query := "date:1610"

but when I turn that into an inclusive range, like so:

let $query := "date:[1600 TO 1610]"

nothing is returned. There is no syntax error -- it just silently fails. Specifying a range query using an XML query description also does not work, but at least such a query is rejected rather than being silently ignored.

The only workaround I've come up with is to manually expand the range into a set like so:

 let $query := "date:(1601 1602 1603 1604 1605 1606 1607 1608 1609 1610)"

That does work, at least for integer values, but has serious limitations for a range over more than a few elements.

What did you expect

I expect range specifications to work in Lucene queries. Range specifications are a documented part of the Lucene query syntax and the eXist docs here simply point to the authoritative Lucene syntax documentation here without specifying that only a subset is implemented in eXist.

Describe how to reproduce or add a test

See above. Also inspect the Lucene query implementation here and note that "range" is not one of the keywords recognized. That is where the Lucene XML query syntax is parsed. I was not able to pinpoint where the ordinary Lucene syntax is parsed.

Personally I'm more interested in the ordinary syntax (a:[B TO C]) but for completeness it seems the XML query syntax should also support it. What ElasticSearch does with JSON should translate pretty naturally into eXist's XML Lucene query specifications.

Context information

Not platform-specific. Currently still not implemented in eXist 4.3.1.

bug enhancement needs documentation Lucene

All 9 comments

looking through the documentation in your OP I agree that

let $query := "date:[1600 TO 1610]"

should work. Out of curiosity does the following work, i.e. switching from array to sequence?

let $query := "date:(1601 to 1610)"

It would really help us if you could modify your OP to include a small sample data set, collection.xconf, and xquery with expected results

OK, here is a reproducer. The suggestion of using "date:(1601 to 1610)" doesn't really make sense because that just means give me matches on any of the three words "1601", "to", or "1610". You get two results because 1601 and 1610 both match, but "to" doesn't and is not a keyword in this context.

xquery version "3.1";

(:  Query to illustrate that Lucene range query syntax is broken :)
let $col := if (xmldb:collection-available("/db/lucene_range_example/"))
            then "/db/lucene_range_example/"
            else xmldb:create-collection("/db/", "lucene_range_example")
let $x :=
    if (file:exists($col || 'lucene_range_example.xml'))
    then xmldb:remove($col , 'lucene_range_example.xml')
    else ()

let $xml := <some-dates>
    <date>1599</date>
    <date>1600</date>
    <date>1601</date>
    <date>1602</date>
    <date>1603</date>
    <date>1604</date>
    <date>1605</date>
    <date>1606</date>
    <date>1607</date>
    <date>1608</date>
    <date>1609</date>
    <date>1610</date>
    <date>1611</date>
    <date>1612</date>
</some-dates>

let $doc := doc(xmldb:store($col , 'lucene_range_example.xml', $xml))
let $x := ft:remove-index($col || 'lucene_range_example.xml')
let $index := <doc>
    {
        for $year in $doc//date/text()
        return <field name="date" store="yes">{$year}</field>
    }
    </doc>
let $x := ft:index($col || 'lucene_range_example.xml', $index)

(: This one works but obviously returns only one value :)
(: let $query := "date:1600" :)

(: This one works (produces 11 results) but requires manual expansion into
 : a complete set of all possible values, which could be prohibitively expensive
 : to produce for large ranges and potentially non-trivial for anything
 : but integer sequences :)
(: let $query := "date:(1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610)" :)

(:  This one uses standard Lucene range syntax but fails silently. No errors
 : are thrown and no results are produced :)
let $query := "date:[1600 TO 1610]"

return ft:search($col, $query)//exist:match


For the XML Lucene query syntax, as far as I know no standard governs it, but you might want to make the equivalent thing to my above example work for that too, and it might look something like the following, though there are different syntaxes possible and I have not thought too deeply about this:


let $query :=
<query>
    <range>
        <term>date</term>
        <begin>1600</begin>
        <end>1610</end>
    </range>
</query>

A sort of a workaround:

date:/160[0-9]|1610/

The

<query>
    <term>date:/160[0-9]|1610/</term>
</query>

at least queries like date:[1600 TO 1610] are working with #2386

Yes, my original reproducer does now work post 142e9619b5b8fc84a914dfd82a98dc824675dd88. Thanks.

@joewiz i agree that this should be documented, since TO is a new feature, would anybody involved with this issue and the PR care to make a documentation PR?

Was this page helpful?
0 / 5 - 0 ratings