Dplyr: Consider using R_tryEval() instead of Rcpp_eval()

Created on 7 Dec 2016 · 22Comments · Source: tidyverse/dplyr

If I understand correctly, this would offer a fail-safe alternative, but errors will most likely be printed to the console instead of captured in the exception object.

@kevinushey: Rcpp moved away from R_TopLevelExec() a year ago (https://github.com/RcppCore/Rcpp/issues/311). What would be other drawbacks of using R_tryEval() instead of Rcpp_eval() if performance is important (https://github.com/RcppCore/Rcpp/issues/578)? Thanks!

feature performance

Source

krlmlr

Most helpful comment

data.table seems to do plain eval(), this is safe in C because they don't have destructors that need to be called when unwinding the stack after a longjmp...

krlmlr on 7 Dec 2016

👍2

All 22 comments

The main issue is that R_tryEval() uses R_TopLevelExec() under the hood, which implies that any active R condition handlers won't actually be in effect while the associated code is executed. This means that code like:

tryCatch(
  topLevelExec(),
  error = function(e) print("Caught error")
)

won't actually work, as that code ends up evaluating in an entirely separate context (without active handlers visible). The original change in Rcpp ended up causing loads of problems in e.g. Shiny since many Rcpp functions were evaluated within condition handlers in this way.

If you plan on evaluating arbitrary user code, then R_TopLevelExec() is likely the wrong way to go.

kevinushey on 7 Dec 2016

Thanks. But I can detect errors raised by R_tryEval(), and call Rcpp::stop() in this case. Warnings are and other conditions are not caught. (See example at the end.)

The use case is the evaluation of arbitrary R code that doesn't need to interact with condition handlers defined before entering C++, such as aggregation functions in a summarise() or mutate() call:

aggregator <- function(x) { ... }
iris %>% group_by(Species) %>% summarize(agg = aggregator(Petal.Width))

They are called for each group, therefore performance is important. Also, they can (in theory) throw an error; I just want to exit safely in this case, and don't care much about the condition object (because the error is printed anyway).

Would that be a valid use case for R_tryEval() ?

`R_tryEval()` example

try_eval_ <-
  Rcpp::cppFunction(
    '
SEXP try_eval(SEXP call, SEXP env) {
  int error = 0;
  SEXP ret = R_tryEval(call, env, &error);
  if (error) Rcpp::stop("R_tryEval() raised error");
  return ret;
}
    ')

try_eval <- function(call, env = parent.frame()) try_eval_(substitute(call), env)

a <- 5
try_eval(1 + a)
## works fine

try_eval(stop("oops"))
## Rcpp::stop() called

tryCatch(try_eval(log(-1)), warning = function(e) print("warning caught"))
## warning handler not run

krlmlr on 7 Dec 2016

If you're okay with condition handlers not being caught through this mechanism (ie you don't plan on running user code which needs to catch messages or warnings) then it should be okay. Ie, you explicitly want to disallow something like:

withCallingHandlers(
    iris %>% group_by(Species) %>% summarize(agg = aggregator(Petal.Width)),
    message = function(m) print(m)
)

where aggregator is some function that might emit an R message that someone wants to catch.

The other thing you might need to double check: how do interrupt handlers work here? E.g. one can do something like:

tryCatch(
    Sys.sleep(100),
    interrupt = function(e) print("Caught interrupt")
)

# press C-c to interrupt

Just in case this is something that may be relevant.

kevinushey on 7 Dec 2016

Awesome, thanks. I tested:

try_eval(Sys.sleep(100))

An interrupt just got me to my Rcpp::stop() call, which is not ideal, but does the job.

krlmlr on 7 Dec 2016

@hadley: I'm thinking about using R_tryEval() instead of Rcpp::Rcpp_eval() for evaluating expressions in dplyr verbs. I have reason to expect substantial performance benefits, especially for grouped mutate/summary with regular evaluation for many groups. However, there are some drawbacks:

Errors can be propagated, but the error message is only printed to the console and cannot be propagated
Warnings are not propagated, but they are preserved
Messages cannot be captured or suppressed
Interrupts cannot be distinguished from errors (except by missing output on the console)

Thoughts?

try_eval_ <-
  Rcpp::cppFunction(
    '
SEXP try_eval(SEXP call, SEXP env) {
  int error = 0;
  SEXP ret = R_tryEval(call, env, &error);
  if (error) Rcpp::stop("R_tryEval() raised error");
  return ret;
}
    ')

try_eval <- function(call, env = parent.frame()) try_eval_(substitute(call), env)

tryCatch(
  try_eval({
    message("message")
    warning("warning")
    stop("stop")
  }),
  error = function(e) cat("error handled: ", conditionMessage(e), "\n", sep = ""),
  warning = function(e) print("warning handled"),
  message = function(e) print("message handled")
)
## message
## Error: stop
## In addition: Warning message:
## warning 
## error handled: R_tryEval() raised error

krlmlr on 7 Dec 2016

I think that seems like a reasonable trade-off. Might also be worth checking to see what data.table does?

hadley on 7 Dec 2016

👍1

data.table seems to do plain eval(), this is safe in C because they don't have destructors that need to be called when unwinding the stack after a longjmp...

krlmlr on 7 Dec 2016

👍2

Here is a package where they wrap a dplyr pipeline in a try block: https://github.com/business-science/tidyquant/blob/9fb6b93f07b4e1f940fec3c85e0bf46d5980f448/R/tq_stock_list.R#L178

lionel- on 8 Apr 2017

There is also R_tryCatchError now to catch possible error in user code. I guess that it could replace the usage of R_TopLevelExec.

PS: Oh, it has been discussed in RcppCore/Rcpp#578 and it was mentioned that the speed of R_tryCatchError might be low.

randy3k on 5 Nov 2017

There's new hope with https://github.com/RcppCore/Rcpp/pull/789 and its successors, we still can fallback to R_tryEval() for R < 3.5.0.

krlmlr on 17 Jan 2018

R 3.5 also has R_tryCatch(), but according to the documentation it also uses R-level tryCatch(), so not much to gain here.

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Condition-handling-and-cleanup-code

krlmlr on 3 Jul 2018

I don't think we should fall back to R_tryEval() for R < 3.5.0 by the way. We need consistent semantics and it is ok if dplyr is a little less performant with old R versions.

lionel- on 3 Jul 2018

This is now irrelevant with @lionel- work on Rcpp 0.12.18

romainfrancois on 14 Sep 2018

Do we need to define RCPP_USE_UNWIND_PROTECT to enable it?

krlmlr on 15 Sep 2018

Yes but perhaps better test-run it for a while? This is all very new, both in R and in Rcpp. We could set it on Travis to start with.

lionel- on 15 Sep 2018

I like this idea. How do we proceed? It might be easiest if Travis would set an environment variable DPLYR_COMPILER_FLAGS which is then picked up by the build. Do we want to sneak in a ${DPLYR_COMPILER_FLAGS} into our Makevars ?

krlmlr on 15 Sep 2018

Oh hmm I thought we would need configure to make it work (cf suggestion on Slack) but that's much neater this way, good idea.

lionel- on 15 Sep 2018

We should do the same on Appveyor, this jumping/throwing business is platform-sensitive.

lionel- on 15 Sep 2018

I'll do it.

krlmlr on 15 Sep 2018

Travis build: https://travis-ci.org/tidyverse/dplyr/builds/428982506

AppVeyor build: https://ci.appveyor.com/project/tidyverse/dplyr/build/1.0.1722

Not opening a pull request to avoid checking the branch and the PR.

krlmlr on 15 Sep 2018

Interesting: https://travis-ci.org/tidyverse/dplyr/jobs/428993952#L1307

Error: segfault from C stack overflow

Let's watch out for these.

krlmlr on 15 Sep 2018

👍1

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/