In eXist v3.6.1, saving a document to a collection with a full text index configured with <lucene diacritics="no"> works without error. In eXist v4.0.0 (and v4.1.0-SNAPSHOT) this action returns a sequence of errors, including java.lang.ExceptionInInitializerError and java.lang.NoClassDefFoundError. A test demonstrating this is below.
This issue was whittled down from the problem reported to me and @adamretter by @wsalesky, in which running analyzers.xql in eXist v4.0.0 via eXide > Run as test results in a test failure. A test demonstrating this is below too.
I expected to be able to configure diacritic-insensitive Lucene full text indexes in eXist v4.0.0.
Run the following XQuery in eXide and monitor exist.log. (Consider first running without diacritics="no"; the query runs without error. Then restore diacritics="no" to observe the error.)
The query will produce a sequence of errors, including java.lang.ExceptionInInitializerError and java.lang.NoClassDefFoundError.
xquery version "3.1";
let $xconf :=
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index>
<lucene diacritics="no">
<!-- <lucene> -->
<text qname="p"/>
</lucene>
</index>
</collection>
let $test-col := xmldb:create-collection("/db", "test")
let $conf-col := xmldb:create-collection("/db/system/config/db", "test")
return
(
xmldb:store($conf-col, "collection.xconf", $xconf),
xmldb:store($test-col, "test.xml",
<test>
<p>Hello</p>
</test>
),
xmldb:remove($test-col),
xmldb:remove($conf-col)
)
With diacritics="no", the query produces a sequence of errors in exist.log - see the full logs
here.
First run:
2018-02-21 00:29:30,620 [qtp1253271425-51] WARN (TransactionManager.java [close]:186) - Transaction was not committed or aborted, auto aborting!
2018-02-21 00:29:30,621 [qtp1253271425-51] ERROR (XQueryServlet.java [process]:534) - null
java.lang.ExceptionInInitializerError: null
at org.exist.indexing.lucene.analyzers.NoDiacriticsStandardAnalyzer.createComponents(NoDiacriticsStandardAnalyzer.java:133) ~[exist-index-lucene.jar:4.1.0-SNAPSHOT]
...
Caused by: com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 2.0.0.0
at com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:483) ~[icu4j-60.2.jar:60.2]
Second and subsequent runs:
2018-02-21 00:29:35,090 [qtp1253271425-54] ERROR (XQueryServlet.java [process]:534) - Could not initialize class org.apache.lucene.analysis.icu.ICUFoldingFilter
java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.analysis.icu.ICUFoldingFilter
at org.exist.indexing.lucene.analyzers.NoDiacriticsStandardAnalyzer.createComponents(NoDiacriticsStandardAnalyzer.java:133) ~[exist-index-lucene.jar:4.1.0-SNAPSHOT]
To reproduce @wsalesky's report with or without eXide, run the following query:
xquery version "3.1";
import module namespace test="http://exist-db.org/xquery/xqsuite" at
"resource:org/exist/xquery/lib/xqsuite/xqsuite.xql";
test:suite(
inspect:module-functions(
xs:anyURI(
"https://raw.githubusercontent.com/eXist-db/exist/develop/extensions/indexes/lucene/test/src/xquery/lucene/analyzers.xql"
)
)
)
Result:
<testsuites>
<testsuite package="http://exist-db.org/xquery/lucene/test/analyzers"
timestamp="2018-02-21T00:35:32.712-05:00" errors="5">Could not initialize class
org.apache.lucene.analysis.icu.ICUFoldingFilter</testsuite>
</testsuites>
Very interesting. After the initial report from @wsalesky I was unable to reproduce this. In fact I can run analyzers.xql here without problem. However, with this second report, I now suspect this might be my own setup, I will investigate this on a clean VM...
@adamretter Thanks! I should also add that I ran tested eXist under both startup scenarios, java -jar start.jar and bin/startup.sh, and the problem was evident regardless. Also, when I tested 3.6.1 and 4.0.0 I tested with the DMG app installer, and when I tested 4.1.0-SNAPSHOT I tested from a completely scrubbed clone of the develop branch to ensure no legacy jars or other build artifacts were left behind to pollute the test environment: ./build.sh clean-all && git clean -xdf && ./build.sh
@joewiz @wsalesky I think it would really help us if we could get @wolfgangmm to help get this PR in - https://github.com/eXist-db/exist/pull/1737 so we can then have analyzers.xql tests executed on our Travis and AppVeyor CIs
@joewiz @wsalesky So the exception of interest is:
So the exception of interest is:
Caused by: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32
, format version 2.0.0.0
at com.ibm.icu.impl.ICUBinary.readHeader(ICUBinary.java:605) ~[icu4j-60.2.jar:60.2]
at com.ibm.icu.impl.ICUBinary.readHeaderAndDataVersion(ICUBinary.java:556) ~[icu4j-60.2.jar:60.2]
at com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:431) ~[icu4j-60.2.jar:60.2]
at com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:351) ~[icu4j-60.2.jar:60.2]
at com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:344) ~[icu4j-60.2.jar:60.2]
at com.ibm.icu.impl.SoftCache.getInstance(SoftCache.java:69) ~[icu4j-60.2.jar:60.2]
at com.ibm.icu.impl.Norm2AllModes.getInstance(Norm2AllModes.java:341) ~[icu4j-60.2.jar:60.2]
at com.ibm.icu.text.Normalizer2.getInstance(Normalizer2.java:202) ~[icu4j-60.2.jar:60.2]
at org.apache.lucene.analysis.icu.ICUFoldingFilter.<clinit>(ICUFoldingFilter.java:64) ~[lucene-analyzers-icu-4.10.4.jar:4.10.4 1662817 - mike -
2015-02-27 16:38:59]
This is caused by Lucene's ICUFoldingFilter trying to load the file jar:file:/tmp/exist/extensions/indexes/lucene/lib/lucene-analyzers-icu-4.10.4.jar!/org/apache/lucene/analysis/icu/utr30.nrm from Lucene with ICU4j.
We are using a very old version of Lucene in eXist-db (4.10.4), if we look at the dependencies for that here: https://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-solr-grandparent/4.10.4/lucene-solr-grandparent-4.10.4.pom. We can see that Lucene 4.10.4 expects ICU4j 53.1.
I think we were most likely just lucky that ICU4j 59.1 worked with Lucene 4.10.4. It seems to me that ICU has changed its data file format in version 60+, and so such an old version of Lucene is not usable with it.
We have two options:
Unfortunately because of the way that eXist-db uses Lucene and various Analyzers, upgrading Lucene requires architectural changes in eXist-db. As such, for a quick fix I would suggest downgrading ICU4j to 59.1.
Unfortunately this came about because the analyzer.xql tests are not enabled for execution by default and so the error was not showing when we updated ICU4j to 60.2; https://github.com/eXist-db/exist/pull/1737 should resolve the test issue.
@adamretter Thanks very much for your analysis! Downgrading to ICU4j 59.1 sounds like the best option.
@adamretter How interesting about the test suite. I had wondered how this test started failing without us noticing.
@adamretter Thanks for investigating this. I agree with @joewiz, downgrading to ICU4j 59.1 sounds like the best option.
Most helpful comment
Very interesting. After the initial report from @wsalesky I was unable to reproduce this. In fact I can run
analyzers.xqlhere without problem. However, with this second report, I now suspect this might be my own setup, I will investigate this on a clean VM...