Exist: NoClassDefFoundError: Could not initialize class org.apache.lucene.analysis.icu.ICUFoldingFilter

Created on 21 Feb 2018  路  7Comments  路  Source: eXist-db/exist

What is the problem

In eXist v3.6.1, saving a document to a collection with a full text index configured with <lucene diacritics="no"> works without error. In eXist v4.0.0 (and v4.1.0-SNAPSHOT) this action returns a sequence of errors, including java.lang.ExceptionInInitializerError and java.lang.NoClassDefFoundError. A test demonstrating this is below.

This issue was whittled down from the problem reported to me and @adamretter by @wsalesky, in which running analyzers.xql in eXist v4.0.0 via eXide > Run as test results in a test failure. A test demonstrating this is below too.

What did you expect

I expected to be able to configure diacritic-insensitive Lucene full text indexes in eXist v4.0.0.

Describe how to reproduce or add a test

Run the following XQuery in eXide and monitor exist.log. (Consider first running without diacritics="no"; the query runs without error. Then restore diacritics="no" to observe the error.)
The query will produce a sequence of errors, including java.lang.ExceptionInInitializerError and java.lang.NoClassDefFoundError.

xquery version "3.1";

let $xconf :=
    <collection xmlns="http://exist-db.org/collection-config/1.0">
        <index>
            <lucene diacritics="no">
            <!--  <lucene>  -->
                <text qname="p"/>
            </lucene>
        </index>
    </collection>
let $test-col := xmldb:create-collection("/db", "test")
let $conf-col := xmldb:create-collection("/db/system/config/db", "test")
return 
    (
        xmldb:store($conf-col, "collection.xconf", $xconf),
        xmldb:store($test-col, "test.xml",
            <test>
                <p>Hello</p>
            </test>
        ),
        xmldb:remove($test-col),
        xmldb:remove($conf-col)
    )

With diacritics="no", the query produces a sequence of errors in exist.log - see the full logs
here.

First run:

2018-02-21 00:29:30,620 [qtp1253271425-51] WARN  (TransactionManager.java [close]:186) - Transaction was not committed or aborted, auto aborting! 
2018-02-21 00:29:30,621 [qtp1253271425-51] ERROR (XQueryServlet.java [process]:534) - null 
java.lang.ExceptionInInitializerError: null
    at org.exist.indexing.lucene.analyzers.NoDiacriticsStandardAnalyzer.createComponents(NoDiacriticsStandardAnalyzer.java:133) ~[exist-index-lucene.jar:4.1.0-SNAPSHOT]
...
Caused by: com.ibm.icu.util.ICUUncheckedIOException: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32, format version 2.0.0.0
    at com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:483) ~[icu4j-60.2.jar:60.2]

Second and subsequent runs:

2018-02-21 00:29:35,090 [qtp1253271425-54] ERROR (XQueryServlet.java [process]:534) - Could not initialize class org.apache.lucene.analysis.icu.ICUFoldingFilter 
java.lang.NoClassDefFoundError: Could not initialize class org.apache.lucene.analysis.icu.ICUFoldingFilter
    at org.exist.indexing.lucene.analyzers.NoDiacriticsStandardAnalyzer.createComponents(NoDiacriticsStandardAnalyzer.java:133) ~[exist-index-lucene.jar:4.1.0-SNAPSHOT]

To reproduce @wsalesky's report with or without eXide, run the following query:

xquery version "3.1";

import module namespace test="http://exist-db.org/xquery/xqsuite" at 
    "resource:org/exist/xquery/lib/xqsuite/xqsuite.xql";

test:suite(
    inspect:module-functions(
        xs:anyURI(
            "https://raw.githubusercontent.com/eXist-db/exist/develop/extensions/indexes/lucene/test/src/xquery/lucene/analyzers.xql"
        )
    )
)

Result:

<testsuites>
    <testsuite package="http://exist-db.org/xquery/lucene/test/analyzers"
        timestamp="2018-02-21T00:35:32.712-05:00" errors="5">Could not initialize class
        org.apache.lucene.analysis.icu.ICUFoldingFilter</testsuite>
</testsuites>

Context information

  • eXist-db version + Git Revision hash: no problem evident in eXist-db v3.6.0 (3be6286); problem evident in v4.0.0 (cc66ebc) and v4.1.0 (23f577317)
  • Java version: 1.8.0_162
  • Operating system: macOS 10.13.3
  • 32 or 64 bit: 64 bit
  • Any custom changes in e.g. conf.xml: no
bug

Most helpful comment

Very interesting. After the initial report from @wsalesky I was unable to reproduce this. In fact I can run analyzers.xql here without problem. However, with this second report, I now suspect this might be my own setup, I will investigate this on a clean VM...

All 7 comments

Very interesting. After the initial report from @wsalesky I was unable to reproduce this. In fact I can run analyzers.xql here without problem. However, with this second report, I now suspect this might be my own setup, I will investigate this on a clean VM...

@adamretter Thanks! I should also add that I ran tested eXist under both startup scenarios, java -jar start.jar and bin/startup.sh, and the problem was evident regardless. Also, when I tested 3.6.1 and 4.0.0 I tested with the DMG app installer, and when I tested 4.1.0-SNAPSHOT I tested from a completely scrubbed clone of the develop branch to ensure no legacy jars or other build artifacts were left behind to pollute the test environment: ./build.sh clean-all && git clean -xdf && ./build.sh

@joewiz @wsalesky I think it would really help us if we could get @wolfgangmm to help get this PR in - https://github.com/eXist-db/exist/pull/1737 so we can then have analyzers.xql tests executed on our Travis and AppVeyor CIs

@joewiz @wsalesky So the exception of interest is:

So the exception of interest is:

Caused by: java.io.IOException: ICU data file error: Header authentication failed, please check if you have a valid ICU data file; data format 4e726d32
, format version 2.0.0.0
        at com.ibm.icu.impl.ICUBinary.readHeader(ICUBinary.java:605) ~[icu4j-60.2.jar:60.2]
        at com.ibm.icu.impl.ICUBinary.readHeaderAndDataVersion(ICUBinary.java:556) ~[icu4j-60.2.jar:60.2]
        at com.ibm.icu.impl.Normalizer2Impl.load(Normalizer2Impl.java:431) ~[icu4j-60.2.jar:60.2]
        at com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:351) ~[icu4j-60.2.jar:60.2]
        at com.ibm.icu.impl.Norm2AllModes$1.createInstance(Norm2AllModes.java:344) ~[icu4j-60.2.jar:60.2]
        at com.ibm.icu.impl.SoftCache.getInstance(SoftCache.java:69) ~[icu4j-60.2.jar:60.2]
        at com.ibm.icu.impl.Norm2AllModes.getInstance(Norm2AllModes.java:341) ~[icu4j-60.2.jar:60.2]
        at com.ibm.icu.text.Normalizer2.getInstance(Normalizer2.java:202) ~[icu4j-60.2.jar:60.2]
        at org.apache.lucene.analysis.icu.ICUFoldingFilter.<clinit>(ICUFoldingFilter.java:64) ~[lucene-analyzers-icu-4.10.4.jar:4.10.4 1662817 - mike -
 2015-02-27 16:38:59]

This is caused by Lucene's ICUFoldingFilter trying to load the file jar:file:/tmp/exist/extensions/indexes/lucene/lib/lucene-analyzers-icu-4.10.4.jar!/org/apache/lucene/analysis/icu/utr30.nrm from Lucene with ICU4j.

We are using a very old version of Lucene in eXist-db (4.10.4), if we look at the dependencies for that here: https://search.maven.org/remotecontent?filepath=org/apache/lucene/lucene-solr-grandparent/4.10.4/lucene-solr-grandparent-4.10.4.pom. We can see that Lucene 4.10.4 expects ICU4j 53.1.

I think we were most likely just lucky that ICU4j 59.1 worked with Lucene 4.10.4. It seems to me that ICU has changed its data file format in version 60+, and so such an old version of Lucene is not usable with it.

We have two options:

  1. Update to a newer Lucene
  2. Downgrade the version of ICU4j to 59.1

Unfortunately because of the way that eXist-db uses Lucene and various Analyzers, upgrading Lucene requires architectural changes in eXist-db. As such, for a quick fix I would suggest downgrading ICU4j to 59.1.

Unfortunately this came about because the analyzer.xql tests are not enabled for execution by default and so the error was not showing when we updated ICU4j to 60.2; https://github.com/eXist-db/exist/pull/1737 should resolve the test issue.

@adamretter Thanks very much for your analysis! Downgrading to ICU4j 59.1 sounds like the best option.

@adamretter How interesting about the test suite. I had wondered how this test started failing without us noticing.

@adamretter Thanks for investigating this. I agree with @joewiz, downgrading to ICU4j 59.1 sounds like the best option.

Was this page helpful?
0 / 5 - 0 ratings