We need to replace literal backslashes with additional backslashes to solve this issue.
https://stackoverflow.com/questions/4653831/regex-how-to-escape-backslashes-and-special-characters
Reproducible Steps:
Using OpenRefine [trunk]
This can be reproduced with any text cell value, for instance
^tha
^tha\
I see the development version no longer behaves in the same way when the regex box is checked and a bad regular expression is written. Errors always appear as you type, but no more the gray spinning wheel. It's certainly related to #1203.

Isn’t this the same as #1203?
@ostephens Yeap, but it should not throw an exception... we don't want it to...its a bad user experience....let me explain so it makes more sense and you can see the alternative way to handle this...
OK, so...
The is a valid regex character technically, in other words lets not treat it as a compile pattern error YET, because users are still typing to put something after it like d or b etc :) We just need to pretty up an info message in the case of a single trailing slash (not throw an error) I would say. We need to tell the user that their expression is not understood YET and that a slash has to have additional syntax after it. We probably need to just catch that specific case and override with a better message to the user. And luckily the slash character is actually the only special character that we have to special case here and deal with btw. Haven't looked how you did it, but perhaps it also could be approached another way with just escaping under the covers for the pattern ? See the stackoverflow link in my initial comment. becomes \\ or \\\\ under the covers, etc might be option, but I'd prefer we see a message always for an incomplete that is missing an additional parameter after it , such as d or w , etc.
Look at how Python re does it for some inspiration as to the possible message to the user ... https://github.com/python/cpython/blob/master/Lib/re.py#L286
UPDATE: actually Python does have special casing mapping as well
https://github.com/python/cpython/blob/master/Lib/re.py#L251
I've updated the initial comment to reflect the current state of things against our trunk.
I agree that the "Unexpected internal error" wording should be changed to something less scary ("invalid regular expression" or something like that). I'm not sure it's worth making special cases for some types of unterminated expressions… I think the user is grown up enough to accept that their expressions may not be valid while they type them.
@wetneb right. its just the slash really that will cause a bit of grief for users. This is a common known thing to pretty up sometimes in tools that use java.util.regex
I’m happy to have another look at this
maybe add more timeout before send request to back end?
@jackyq2015 I thought about that, but it felt like that was just trying to avoid the issue rather than resolving it, and will decrease the overall responsiveness of the filter - which I think would be a negative outcome for the user.
@thadguidry I'm not quite clear on why '' is a special case - why is getting an error with ^tha\ any different to getting an error with ^tha[ ?
@ostephens Hi Owen, I think as a programmer your caught up on the error, which is expected, but I want you to approach this as you are showing an 8th grader a demo of text filtering. How would you make this a better experience for them while also helping them to learn more? Look at how other tools beautify this error. Try it on https://regexr.com/ You can hover over the error and see a friendly message ... "dangling backslash". Yeah, that means extra work for us and we also have to have a translatable mapping file, but so what, it brings value to our users everywhere we are using java.util.regex
I create these issues so that even folks outside our community might sometimes tiptoe in and see something interesting they can fix and work on and improve our users experience as well. This isn't something that necessarily needs to be worked on this week or next, but in my prioritizing as High we are noting that "this is important to our users". Pattern matching and finding is not a niche area, as noted by Ettore's recent issue created to have a find() operation to help our basic match(). Looking for patterns is key to OpenRefine and anything we can do to help our users with that should be deemed very important.
A reminder of WHO our users really are. They are not programmers, we are. As design lead for our community, I want to challenge and remind our contributors that "we" need to do the hard stuff to make our users lives easier and more pleasant. David Huynh did back-flips for me as a user...and all of us benefit from the work he and others did to make our lives easier.
(incidentally, its the same philosophy that Schema.org has, when we have to, we make it easier for consumers at the cost of it being harder for publishers)
I agree that there is no special case. This is only case which is code handle it properly or not. I think
Thanks @thadguidry thanks @jackyq2015 . I definitely understand that the user needs better feedback on what is happening and how to fix the problem.
What I'm thinking is that I can write something to validate the regular expression at the server end, and pass back a meaningful error message in JSON - which would be displayed in place of the current error generated by the java exception. Does this sound like a good approach?
Looking at the regexr example shared by @thadguidry, it catches and reports the following errors:
groupopen:"Unmatched opening parenthesis.",
groupclose:"Unmatched closing parenthesis.",
quanttarg:"Invalid target for quantifier.",
setopen:"Unmatched opening square bracket.",
esccharopen:"Dangling backslash.",
quantrev:"Quantifier minimum is greater than maximum.",
rangerev:"Range values reversed. Start char is greater than end char.",
lookbehind:"Lookbehind is not supported in JavaScript.",
fwdslash:"Unescaped forward slash.",
esccharbad:"Invalid escape sequence."
We won't need all of these but it looks like a good starting point.
@ostephens You might also find a way to de-couple and reuse that effort in GREL in our expression dialog for match() and partition(), etc , i.e. all the places that use java.util.regex
Regarding the lookbehind... hmm, the text filter still uses java here which does support lookbehind, although not the blazing fastest https://www.javaworld.com/article/2077757/core-java/optimizing-regular-expressions-in-java.html Also, lookbehind had bugs back in Java 4 and 5 days but has been fixed. Here's a primer that details important notes about lookbehind: https://www.regular-expressions.info/lookaround.html
@ostephens How's work progressing on this Owen ?
No progress as yet, but will be looking at it next week if that's OK
@ostephens of course its OK. We appreciate any effort.
@ostephens Are we good on closing this issue out and triaging it into Milestone 2.9 as done ? or does the team think there is more to do ??
@thadguidry I think this is done, at least to the extent needed for now, I think it can be closed
Most helpful comment
Thanks @thadguidry thanks @jackyq2015 . I definitely understand that the user needs better feedback on what is happening and how to fix the problem.
What I'm thinking is that I can write something to validate the regular expression at the server end, and pass back a meaningful error message in JSON - which would be displayed in place of the current error generated by the java exception. Does this sound like a good approach?
Looking at the regexr example shared by @thadguidry, it catches and reports the following errors:
We won't need all of these but it looks like a good starting point.