Describe the bug
fn:replace, fn:tokenize and fn:analyze-string allow to use a pattern that matches an empty string. That results in odd behaviour as in this example:
replace( '12.34' , '^\D*', '')
The provided pattern does always match because it matches the empty string.
Expected behavior
Error FORX0003 is thrown with location information
Actual
The first character is swallowed: 2.34
To Reproduce
replace( '12.34' , '^\D*','')
or
replace( '12.34' , '^[^0-9]*','')
will return 2.34 instead of the desired 12.34
Reference
This used to have a different behaviour in earlier versions of existdb where a pattern that matched an empty string would just return the input unchanged (likely related to #3530).
Context (please always complete the following information):
This might be a change in behaviour _again_.
replace("$11.23", "^[^0-9]*(.*)$", "$1")
is the correct form to write the replacement.
Looking at it again, I am unsure. This looks like a bug. But
replace('12.34' , '^\D','') replace('12.34' , '^[^0-9]','')So replace($may-start-with-currency-symbol, "^\D+", "") might be the easiest solution.
BaseX 9.5 will raise error [FORX0003] Pattern matches empty string. for the original pattern "^[^0-9]*".
Saxon 10.0 (HE) also throws the same error.
So all in all this is definitely a bug, because no error is thrown!
@PieterLamers would you or may I edit the bug description to reflect the new findings? Or should I open a separate one?
Hi @line-o , thanks for the explanations! You are welcome to edit the ticket. I think I will simply replace the * by a + to avoid the error.
Most helpful comment
Saxon 10.0 (HE) also throws the same error.
So all in all this is definitely a bug, because no error is thrown!