Exist: util:get-fragment-between is not producing correct fragment(s)

Created on 29 Nov 2018  路  11Comments  路  Source: eXist-db/exist

What is the problem

The fragmenting process used by util:get-fragment-between isn't stopping at second node.

Running util:get-fragment-between to get the fragment between the first and second <pb> in the full sample below returns:

            <pb facs="1.jpg"></pb>     
            <p>Aus dem Leben einer Kartoffel.</p>
            <pb facs="2.jpg"></pb>
       </front>
       <body>
            <pb facs="3.jpg"></pb>
            <p>Hubertus Knoll spazierte 眉ber das <pb facs="4.jpg"></pb> Feld.</p>
            <pb facs="5.jpg"></pb>
        </body>
    </text>
</TEI>

Maybe the problem was caused here?
https://github.com/eXist-db/exist/blob/eXist-4.4.0/src/org/exist/xquery/functions/util/GetFragmentBetween.java#L182

What did you expect

I expected output (like in version 4.3.1):

<pb facs="1.jpg"></pb>
<p>Aus dem Leben einer Kartoffel.</p>

Describe how to reproduce or add a test

Store the following as kartoffelmann.xml:

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <text>
        <front>
            <pb facs="1.jpg"/>
            <p>Aus dem Leben einer Kartoffel.</p>
            <pb facs="2.jpg"/>
        </front>
        <body>
            <pb facs="3.jpg"/>
            <p>Hubertus Knoll spazierte 眉ber das <pb facs="4.jpg"/> Feld.</p>
            <pb facs="5.jpg"/>
        </body>
    </text>
</TEI>

Run the following query:

xquery version "3.1";

declare namespace tei = "http://www.tei-c.org/ns/1.0";

let $doc := doc("kartoffelmann.xml")
let $pbs := $doc//tei:pb
let $pb1 := $pbs[1]
let $pb2 := $pbs[2]
return
    util:get-fragment-between($pb1, $pb2, false(), true())

Results are as above.

Context information

  • eXist-db version: eXist-db >= 4.4.0
  • Java version: jdk1.8.0_191
  • Operating system: Windows 10
  • 64 bit
  • How is eXist-db installed? JAR installer
  • Any custom changes in e.g. conf.xml: no
bug regression

All 11 comments

I can confirm @babslgam's report with the current eXist-db develop-4.x.x. A simplified example:

xquery version "3.1";

let $data := 
    <root>
        <x/>
        <y/>
        <z/>
    </root>
let $store := xmldb:store("/db", "test.xml", $data)
let $doc := doc("/db/test.xml")
let $elems := $doc/root/*
let $fragment := util:get-fragment-between($elems[1], $elems[2], true(), true())
return 
    $fragment

unexpectedly returns an additional <z/> and </root>:

<root>
<x></x><y></y><z></z></root></root>

instead of the expected:

<root><x/><y/></root>

With the 3rd parameter as false(), the results also include an unexpected <z/> and </root>:

<x></x><y></y><z></z></root>

instead of the expected:

<x/><y/>

We don't appear to have any tests for this function. Perhaps we can build on this?

xquery version "3.1";

module namespace gfb = "http://exist-db.org/test/util/get-fragment-between";

declare namespace test="http://exist-db.org/xquery/xqsuite";

declare variable $gfb:DATA := 
    <root xmlns="http://exist-db.org/xquery/xqsuite">
        <x/>
        <y/>
        <z/>
    </root>;

declare 
    %test:setUp 
function gfb:setup() {
    xmldb:store("/db", "test.xml", $gfb:DATA)
};

declare 
    %test:tearDown 
function gfb:teardown() {
    xmldb:remove("/db/test.xml")
};

declare 
    %test:assertEquals("<x></x><y></y>")
function gfb:fragment-no-namespace() {
    let $doc := doc("/db/test.xml")
    let $elems := $doc/test:root/*
    let $fragment := util:get-fragment-between($elems[1], $elems[2], false(), false())
    return 
        $fragment
};

declare 
    %test:assertEquals("<root><x></x><y></y></root>")
function gfb:wrapped-fragment-no-namespace() {
    let $doc := doc("/db/test.xml")
    let $elems := $doc/test:root/*
    let $fragment := util:get-fragment-between($elems[1], $elems[2], true(), false())
    return 
        $fragment => replace("\s", "")
};

declare 
    %test:assertTrue
function gfb:wrapped-fragment-is-parseable() {
    let $doc := doc("/db/test.xml")
    let $elems := $doc/test:root/*
    let $fragment := util:get-fragment-between($elems[1], $elems[2], true(), false())
    let $parsed := try { parse-xml($fragment) } catch * { $err:code } 
    return
        $parsed instance of element(root)
};

This returns the following output:

<testsuites>
    <testsuite package="http://exist-db.org/test/util/get-fragment-between"
        timestamp="2018-11-29T12:34:49.258-05:00" tests="3" failures="3" errors="0" pending="0"
        time="PT0.007S">
        <testcase name="fragment-no-namespace" class="gfb:fragment-no-namespace">
            <failure message="assertEquals failed." type="failure-error-code-1"
                >&lt;x&gt;&lt;/x&gt;&lt;y&gt;&lt;/y&gt;</failure>
            <output>&lt;x&gt;&lt;/x&gt;&lt;y&gt;&lt;/y&gt;&lt;z&gt;&lt;/z&gt;&lt;/root&gt;</output>
        </testcase>
        <testcase name="wrapped-fragment-is-parseable" class="gfb:wrapped-fragment-is-parseable">
            <failure message="assertTrue failed." type="failure-error-code-1"/>
            <output>false</output>
        </testcase>
        <testcase name="wrapped-fragment-no-namespace" class="gfb:wrapped-fragment-no-namespace">
            <failure message="assertEquals failed." type="failure-error-code-1"
                >&lt;root&gt;&lt;x&gt;&lt;/x&gt;&lt;y&gt;&lt;/y&gt;&lt;/root&gt;</failure>
            <output>&lt;root&gt;&lt;x&gt;&lt;/x&gt;&lt;y&gt;&lt;/y&gt;&lt;z&gt;&lt;/z&gt;&lt;/root&gt;&lt;/root&gt;</output>
        </testcase>
    </testsuite>
</testsuites>

I can confirm this still happens on version 5.2.0, using the testcase provided in https://github.com/eXist-db/exist/issues/2316#issue-385761409.

There was no progress here for one and a half year. The bad thing about this is that the function still exists and does the wrong thing. May I suggest you just remove this from the exist-db 5 series? You removed a lot of the built in functions so removing this one wont matter much.

Is there any progress here?

This problem already keeps us from upgrading to a higher exist version and we lack of time to rebuild the function again ourselves (for some already finished projects). Is there anything planned?

@simar0at We are an Open Source community project, so we rely on contributors.

@FrederikeNeuber @simar0at I think I just fixed it now in - https://github.com/eXist-db/exist/pull/3328. So that will go into the upcoming 5.3.0 release.

@adamretter What you might not know is that @babslgam is a colleague of mine and we are watching this ticket for two years now with some disgust and some amusement. I had a theory about the problem that is close to what you found now and given the provided test it was not hard to start debugging. But frankly: Patching around the heart of existdb with its node implementations is not what someone having two hours of experience with the codebase like me should do in my opinion. Keep up the good work!

@simar0at Well the software is free and you get to use it for free. People spend their own free time fixing problems, I am not paid to fix issues. I took my Thursday evening to fix this for you all for free.

Maybe others are paid to fix such things, but I am not. As I said, we hope users will contribute. Those contributions might be code, documentation, answers to other users or even sponsorship. We do the best that we can.

Well its finally done. I was motivated by someone else then the people at my institute acutally seems to need this function and also gave it a try again today. Unfortunately with my install if Intellij IDEA IDE whenever I set a breakpoint in the relevant get fragments between code the result of the java code changed drastically (empty string, NullPointerException, one xml element returned). I just tried eclipse: I cannot even start to debug there. Good that you found the solution faster than me today with your knowledge of codebase and tools. Something that strengthens my belief that you cannot just do such stuff with java knowledge from the last decade. I use XQuery day to day, not java.

@FrederikeNeuber @simar0at I think I just fixed it now in - #3328. So that will go into the upcoming 5.3.0 release.

Thanks @adamretter - I am looking forward to it and thank you for your hard work in general!

Unfortunately with my install if Intellij IDEA IDE whenever I set a breakpoint in the relevant get fragments between code the result of the java code changed drastically (empty string, NullPointerException, one xml element returned)

IntelliJ in their debug view try and show you the values of various variables by default. In GetFragmentBetween.java there is some code that is invocation sensitive. So you have to disable the debugger view's automatic toString and inline string options - otherwise it causes the state of the running code to change.

Was this page helpful?
0 / 5 - 0 ratings