Exist: string comparison on non-indexed empty text node with collation specified results in exception

Created on 1 Jun 2018  路  9Comments  路  Source: eXist-db/exist

What is the problem

Queries that include a filter clause involving string comparison on EMPTY text nodes which are not indexed fail with the following exception only when a collation is in effect either as a default collation, or explicitly passed to the string comparison function:

java.lang.IllegalArgumentException: Illegal argument target.  Argument can not be null or of length 0
    at com.ibm.icu.text.SearchIterator.<init>(SearchIterator.java:636)
    at com.ibm.icu.text.StringSearch.<init>(StringSearch.java:181)
    at com.ibm.icu.text.StringSearch.<init>(StringSearch.java:234)
    at org.exist.util.Collations.contains(Collations.java:338)
    at org.exist.xquery.value.StringValue.contains(StringValue.java:673)
    at org.exist.xquery.GeneralComparison.compareAtomic(GeneralComparison.java:1089)
    at org.exist.xquery.GeneralComparison.nodeSetCompare(GeneralComparison.java:648)
    at org.exist.xquery.GeneralComparison.quickNodeSetCompare(GeneralComparison.java:923)
    at org.exist.xquery.GeneralComparison.eval(GeneralComparison.java:469)
    at org.exist.xquery.modules.range.Lookup.eval(Lookup.java:317)
    at org.exist.xquery.InternalFunctionCall.eval(InternalFunctionCall.java:41)
    at org.exist.xquery.AbstractExpression.eval(AbstractExpression.java:71)
    at org.exist.xquery.PathExpr.eval(PathExpr.java:276)
    at org.exist.xquery.Predicate.selectByNodeSet(Predicate.java:450)
    at org.exist.xquery.Predicate.evalPredicate(Predicate.java:326)
    at org.exist.xquery.LocationStep.processPredicate(LocationStep.java:256)
    at org.exist.xquery.LocationStep.applyPredicate(LocationStep.java:243)
    at org.exist.xquery.LocationStep.eval(LocationStep.java:474)
        ...

What did you expect

I expected string comparison to work on unindexed empty text nodes with a collation specified

Describe how to reproduce or add a test

create a document with the following content, in a non-indexed collection
<entry> <a>xxx</a> <b></b> </entry>

run these queries:

collection("/db/data/testqueries")//a[contains(.,'x',"?lang=en-US")]
collection("/db/data/testqueries")//b[contains(.,'x',"?lang=en-US")]

The second query will fail with the exception quoted above.

Having a range index defined on <b>, or omitting the collation argument results in no exception being thrown.

Context information

  • eXist-db version 4.1
  • Java version Java8u171
  • Operating system (Windows 10)
  • 64 bit
  • No custom changes
bug

All 9 comments

see #1379 for more fn:contains oddities involving ?

A little update:

Looked into this bug myself, solution seems very simple, but I'm not a Java developer ... better safe than sorry...

in org.exist.util.Collations.contains(Collations.java:338)

public static boolean contains(@Nullable final Collator collator, final String s1, final String s2) {
        if (collator == null) {
            return s1.contains(s2);
        } else {
            final SearchIterator searchIterator =
                    new StringSearch(s2, new StringCharacterIterator(s1), (RuleBasedCollator)collator);
            return searchIterator.first() >= 0;
        }
    }

It seems that searchIterator cannot be constructed from an empty string. Just check for empty strings (s1 and s2) before using the collator-based StringSearch. I hope empty strings never contain anything(?), regardless of the collation, no matter how exotic it is, so this should be a fairly easy fix. Just rely on s1.contains(s2) if either string is empty.

(Other collation-enabled string functions may have similar issues)

@merenyics thanks for investigating, your suggestion was indeed correct which combined with your simple test case made this easy to fix. Thank you.

thanks for the fix, please note that the problem is not just contains, but a few other similar comparison functions, like starts-with, ends-with, and indexof are also affected

@merenyics hmm that wasn't clear to me... once the fix for contains is merged would you like to send a PR which adds further test cases to exist-core/src/test/xquery/collations.xq to show the problems?

the problems in question are just a few lines of code away from the 'contains' fix and involve very similar constructs, it would be a shame to miss those ;-) I'll be happy to provide test cases as soon as I can.

@adamretter I'm sorry, but I'm not very familiar with github, so I'd rather not create a pull request (if that is what PR stands for, and I know its time I learned how to do it...), but here are the test cases you asked for:

declare
    %test:assertEquals("<a>xxx</a>")
function collations:non-empty-string-starts-with() {
    doc("/db/collations-test/test.xml")//a[starts-with(.,'x',"?lang=en-US")]
};

 declare
    %test:assertEmpty
function collations:empty-string-starts-with() {
    doc("/db/collations-test/test.xml")//b[starts-with(.,'x',"?lang=en-US")]
};

declare
    %test:assertEquals("<a>xxx</a>")
function collations:non-empty-string-ends-with() {
    doc("/db/collations-test/test.xml")//a[ends-with(.,'x',"?lang=en-US")]
};

 declare
    %test:assertEmpty
function collations:empty-string-ends-with() {
    doc("/db/collations-test/test.xml")//b[ends-with(.,'x',"?lang=en-US")]
};

indexOf could also be problematic, but I don't know what XQuery will result in this particular function being called, so I cannot come up with a test case for this one.

Please consider reopening the issue, as the current hotfix covers only 1 out of 4 possible errors.

@merenyics could you open a new issue, reference this issue and copy and paste your test cases in there please.

@adamretter please see #2678 per your request

Was this page helpful?
0 / 5 - 0 ratings

Related issues

adamretter picture adamretter  路  6Comments

joewiz picture joewiz  路  3Comments

lguariento picture lguariento  路  5Comments

dizzzz picture dizzzz  路  4Comments

opax picture opax  路  3Comments