Describe the bug
If one of the conditions in the GREL boolean function or returns an error, the function itself also returns the error, even if the second condition is true. This is counter intuitive as it should return true, if one condition is true, no matter what the result of the second condition is, even if it is an error.
To Reproduce
Steps to reproduce the behavior:
group tags
A
A
B d,b
B c,a,g,f
B h,f,b,d,a
B f,i
or(cells["group"].value=="A",sort(split(cells["tags"].value,","))[0]=="a")Expected behavior
In this small example all of the rows that are in group "A" and all of the rows from group "B" that have an "a" as part of their tag list, should get the export flag "true".
Current Results
For the first two rows, where the group is "A", an error "split expects 2 strings, or 1 string and 1 regex, followed by an optional boolean" is returned, which is cased by the split function as the input value (the tag list) is null.

OpenRefine (please complete the following information):
Workaround
It is possible to work around this problem by using nested if statements, this however makes the code longer and harder to read: if(cells["group"].value=="A",true,if(sort(split(cells["tags"].value,","))[0]=="a",true,false))
Related Bug
I know this is a minor issue and probably low priority, especially since it is easily preventable with the workaround, but I wanted to document it here anyways, as this has caused me to spend quite some time debugging my function (one that was more complex than this reduced example) until I realized that it is the OR function that is causing it.
Any expressions that result in an error can always be set to blank or store error. This is expected and necessary behavior that we introduced long ago to allow the user (and not our code) to control if they want to ignore the error and set to blank or not.
Why did you choose to store error ? We default expressions to set to blank.

I guess the problem here is the expectation that if it's an 'OR' that one condition being satisfied is enough to give TRUE - so that the other condition errors is not relevant.
So setting to Blank isn't the right outcome - it should result in true
What is the value of the 2nd condition from OR() ?
If I recall we purposely were not strict on this so that users got the bubbled up error and we allowed them to see the error and handle it as they wished, or undo and try again. :)
Remember our user base...typically non-programmers.
I would say that the output of split() is ERROR because no array could be built for sort(). It's important that we tell the user about that we couldn't produce Boolean b in the OR() function. And we did, quite nicely.
We have been down this path before and 9 times out of 10 users really want to control how to handle this scenario as @DavidFichtmueller did when he saw some outliers and resolved them in his own way, GREL, replace(), whatever. The reason he saw some outliers is that we told him that an inner function failed. That's useful to him since he got feedback.
This sounds like an enhancement request (since this is not a bug) for having OR() function to have a force boolean on an error. But I don't see a need for that and can be handled on a 2nd pass after user identifies his outliers and then transforms in any fashion he desires. For me, I often store the error...then come back and Facet on those error rows and handle in different ways depending on my row/cells value with the power of OpenRefine at my beck and call not forcing me into a corner.
@thadguidry , as @ostephens has pointed out, it is the expectation that the second part of the OR clause shouldn't matter, if the first part is true. I don't want to see the error (or get a blank result [I selected the set error step in the description above to highlight the problem a bit more]) if this doesn't effect the outcome of the function. I completely agree and appreciate that OpenRefine gives the users control on how to handle error cases. If the first parameter of the OR function is false and the error occurs in evaluating the second part, then I would absolutely expect an error to know what when wrong, since the result of the function is now unknowable. In this case however the function was carefully created so that the error would never occur if the OR function behaved as expected.
I encountered this bug in a much more complex query. For this ticket I created a much easier example that still highlights this problem. I spend a lot more time that I wanted to on debugging my function only to realize this wasn't about the combination of multiple cross functions that I used, but the simple OR that broke my transaction. This function was written to be copied and applied to other datasets in the future, so the goal is to have as little manual steps (like faceting and error handling) involved as possible.
The same reasoning also applies to the AND function if one of the parameters is false. There is no need to check the second one or pass exceptions, instead the function should just return false. But this has already been discussed over on #1145 which is why I linked it as related.
I hope I was able to share a bit more about my reasoning, motivation and usecase for this issue, I stand by my position that this is a bug, as the function behaves differently than one would expect, especially in comparison to the many comparable functions in other programming languages.
Yes this is a valid bug, as summarised by @ostephens. Sorry for the communication issues on this thread.
@wetneb Please respect this discussion. There are no communication issues and this is NOT A BUG. I have stated the reasons why we made these decisions in the past and I am highlighting them for you. Those of undesired side effects where the user does not realize what is happening. I.E. the complete opposite of @DavidFichtmueller use case would be my use case of keeping our Eager Boolean functions. It is fine if the community wants more than Eager operators. I'm just saying let's then make additional Short-circuit operators for control purposes as the OP is requesting. (I.E. let's not forget our programming history and reasons for Eager-ness ... or not) https://en.wikipedia.org/wiki/Short-circuit_evaluation
@DavidFichtmueller
OK, so I've heard you well and clear and it seems the community also wants some short-circuited Boolean functions.
My Strong Opinion
(because I've bit the side effect bullet in my lifetime more times than I can count :-) and trying to give a friendly alternative):
We can introduce more friendly Short-Circuit Boolean functions, since GREL is friendly...which will provide the extra control structures that @DavidFichtmueller and others are asking for and not disturb our existing Eager Boolean functions where I care about side effects and handling them:
## Eager functions
or()
and()
## NEW Short-Circuit functions
orelse()
andalso()
Most helpful comment
@thadguidry , as @ostephens has pointed out, it is the expectation that the second part of the OR clause shouldn't matter, if the first part is true. I don't want to see the error (or get a blank result [I selected the set error step in the description above to highlight the problem a bit more]) if this doesn't effect the outcome of the function. I completely agree and appreciate that OpenRefine gives the users control on how to handle error cases. If the first parameter of the OR function is false and the error occurs in evaluating the second part, then I would absolutely expect an error to know what when wrong, since the result of the function is now unknowable. In this case however the function was carefully created so that the error would never occur if the OR function behaved as expected.
I encountered this bug in a much more complex query. For this ticket I created a much easier example that still highlights this problem. I spend a lot more time that I wanted to on debugging my function only to realize this wasn't about the combination of multiple cross functions that I used, but the simple OR that broke my transaction. This function was written to be copied and applied to other datasets in the future, so the goal is to have as little manual steps (like faceting and error handling) involved as possible.
The same reasoning also applies to the AND function if one of the parameters is false. There is no need to check the second one or pass exceptions, instead the function should just return false. But this has already been discussed over on #1145 which is why I linked it as related.
I hope I was able to share a bit more about my reasoning, motivation and usecase for this issue, I stand by my position that this is a bug, as the function behaves differently than one would expect, especially in comparison to the many comparable functions in other programming languages.