If I understand correctly, this would offer a fail-safe alternative, but errors will most likely be printed to the console instead of captured in the exception object.
@kevinushey: Rcpp moved away from R_TopLevelExec() a year ago (https://github.com/RcppCore/Rcpp/issues/311). What would be other drawbacks of using R_tryEval() instead of Rcpp_eval() if performance is important (https://github.com/RcppCore/Rcpp/issues/578)? Thanks!
The main issue is that R_tryEval()
uses R_TopLevelExec()
under the hood, which implies that any active R condition handlers won't actually be in effect while the associated code is executed. This means that code like:
tryCatch(
topLevelExec(),
error = function(e) print("Caught error")
)
won't actually work, as that code ends up evaluating in an entirely separate context (without active handlers visible). The original change in Rcpp ended up causing loads of problems in e.g. Shiny since many Rcpp functions were evaluated within condition handlers in this way.
If you plan on evaluating arbitrary user code, then R_TopLevelExec()
is likely the wrong way to go.
Thanks. But I can detect errors raised by R_tryEval()
, and call Rcpp::stop()
in this case. Warnings are and other conditions are not caught. (See example at the end.)
The use case is the evaluation of arbitrary R code that doesn't need to interact with condition handlers defined before entering C++, such as aggregation functions in a summarise()
or mutate()
call:
aggregator <- function(x) { ... }
iris %>% group_by(Species) %>% summarize(agg = aggregator(Petal.Width))
They are called for each group, therefore performance is important. Also, they can (in theory) throw an error; I just want to exit safely in this case, and don't care much about the condition object (because the error is printed anyway).
Would that be a valid use case for R_tryEval()
?
R_tryEval()
exampletry_eval_ <-
Rcpp::cppFunction(
'
SEXP try_eval(SEXP call, SEXP env) {
int error = 0;
SEXP ret = R_tryEval(call, env, &error);
if (error) Rcpp::stop("R_tryEval() raised error");
return ret;
}
')
try_eval <- function(call, env = parent.frame()) try_eval_(substitute(call), env)
a <- 5
try_eval(1 + a)
## works fine
try_eval(stop("oops"))
## Rcpp::stop() called
tryCatch(try_eval(log(-1)), warning = function(e) print("warning caught"))
## warning handler not run
If you're okay with condition handlers not being caught through this mechanism (ie you don't plan on running user code which needs to catch messages or warnings) then it should be okay. Ie, you explicitly want to disallow something like:
withCallingHandlers(
iris %>% group_by(Species) %>% summarize(agg = aggregator(Petal.Width)),
message = function(m) print(m)
)
where aggregator
is some function that might emit an R message that someone wants to catch.
The other thing you might need to double check: how do interrupt handlers work here? E.g. one can do something like:
tryCatch(
Sys.sleep(100),
interrupt = function(e) print("Caught interrupt")
)
# press C-c to interrupt
Just in case this is something that may be relevant.
Awesome, thanks. I tested:
try_eval(Sys.sleep(100))
An interrupt just got me to my Rcpp::stop()
call, which is not ideal, but does the job.
@hadley: I'm thinking about using R_tryEval()
instead of Rcpp::Rcpp_eval()
for evaluating expressions in dplyr verbs. I have reason to expect substantial performance benefits, especially for grouped mutate/summary with regular evaluation for many groups. However, there are some drawbacks:
Thoughts?
try_eval_ <-
Rcpp::cppFunction(
'
SEXP try_eval(SEXP call, SEXP env) {
int error = 0;
SEXP ret = R_tryEval(call, env, &error);
if (error) Rcpp::stop("R_tryEval() raised error");
return ret;
}
')
try_eval <- function(call, env = parent.frame()) try_eval_(substitute(call), env)
tryCatch(
try_eval({
message("message")
warning("warning")
stop("stop")
}),
error = function(e) cat("error handled: ", conditionMessage(e), "\n", sep = ""),
warning = function(e) print("warning handled"),
message = function(e) print("message handled")
)
## message
## Error: stop
## In addition: Warning message:
## warning
## error handled: R_tryEval() raised error
I think that seems like a reasonable trade-off. Might also be worth checking to see what data.table does?
data.table seems to do plain eval(), this is safe in C because they don't have destructors that need to be called when unwinding the stack after a longjmp...
Here is a package where they wrap a dplyr pipeline in a try block: https://github.com/business-science/tidyquant/blob/9fb6b93f07b4e1f940fec3c85e0bf46d5980f448/R/tq_stock_list.R#L178
There is also R_tryCatchError
now to catch possible error in user code. I guess that it could replace the usage of R_TopLevelExec
.
PS: Oh, it has been discussed in RcppCore/Rcpp#578 and it was mentioned that the speed of R_tryCatchError
might be low.
There's new hope with https://github.com/RcppCore/Rcpp/pull/789 and its successors, we still can fallback to R_tryEval()
for R < 3.5.0.
R 3.5 also has R_tryCatch()
, but according to the documentation it also uses R-level tryCatch()
, so not much to gain here.
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Condition-handling-and-cleanup-code
I don't think we should fall back to R_tryEval()
for R < 3.5.0 by the way. We need consistent semantics and it is ok if dplyr is a little less performant with old R versions.
This is now irrelevant with @lionel- work on Rcpp 0.12.18
Do we need to define RCPP_USE_UNWIND_PROTECT
to enable it?
Yes but perhaps better test-run it for a while? This is all very new, both in R and in Rcpp. We could set it on Travis to start with.
I like this idea. How do we proceed? It might be easiest if Travis would set an environment variable DPLYR_COMPILER_FLAGS
which is then picked up by the build. Do we want to sneak in a ${DPLYR_COMPILER_FLAGS}
into our Makevars
?
Oh hmm I thought we would need configure
to make it work (cf suggestion on Slack) but that's much neater this way, good idea.
We should do the same on Appveyor, this jumping/throwing business is platform-sensitive.
I'll do it.
Travis build: https://travis-ci.org/tidyverse/dplyr/builds/428982506
AppVeyor build: https://ci.appveyor.com/project/tidyverse/dplyr/build/1.0.1722
Not opening a pull request to avoid checking the branch and the PR.
Interesting: https://travis-ci.org/tidyverse/dplyr/jobs/428993952#L1307
Error: segfault from C stack overflow
Let's watch out for these.
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/
Most helpful comment
data.table seems to do plain eval(), this is safe in C because they don't have destructors that need to be called when unwinding the stack after a longjmp...