Exist: [BUG] replace, tokenize and analyze-string must throw when pattern matches an empty string

Created on 6 Apr 2021  路  8Comments  路  Source: eXist-db/exist

Describe the bug

fn:replace, fn:tokenize and fn:analyze-string allow to use a pattern that matches an empty string. That results in odd behaviour as in this example:

replace( '12.34' , '^\D*', '')

The provided pattern does always match because it matches the empty string.

Expected behavior

Error FORX0003 is thrown with location information

Actual

The first character is swallowed: 2.34

To Reproduce

replace( '12.34' , '^\D*','')  

or

replace( '12.34' , '^[^0-9]*','')

will return 2.34 instead of the desired 12.34

Reference

This used to have a different behaviour in earlier versions of existdb where a pattern that matched an empty string would just return the input unchanged (likely related to #3530).

fn:replace specification

XQTS 31 tests

Context (please always complete the following information):

  • OS: Windows10
  • eXist-db version: 5.3.0-snapshot 55e77cc52f8555dda5ed058ef98f283d69a72a8e
  • Java Version 1.8.0.281
bug xquery

Most helpful comment

Saxon 10.0 (HE) also throws the same error.
So all in all this is definitely a bug, because no error is thrown!

All 8 comments

This might be a change in behaviour _again_.

replace("$11.23", "^[^0-9]*(.*)$", "$1")

is the correct form to write the replacement.

Looking at it again, I am unsure. This looks like a bug. But

  • replace('12.34' , '^\D','')
  • replace('12.34' , '^[^0-9]','')
    both do not match and therefore do not replace a single character at the beginning.

So replace($may-start-with-currency-symbol, "^\D+", "") might be the easiest solution.

BaseX 9.5 will raise error [FORX0003] Pattern matches empty string. for the original pattern "^[^0-9]*".

Saxon 10.0 (HE) also throws the same error.
So all in all this is definitely a bug, because no error is thrown!

@PieterLamers would you or may I edit the bug description to reflect the new findings? Or should I open a separate one?

Hi @line-o , thanks for the explanations! You are welcome to edit the ticket. I think I will simply replace the * by a + to avoid the error.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

adamretter picture adamretter  路  4Comments

mathias-goebel picture mathias-goebel  路  4Comments

jonjhallettuob picture jonjhallettuob  路  3Comments

mathias-goebel picture mathias-goebel  路  4Comments

opax picture opax  路  3Comments