Dplyr: Can't use 'recover' with dplyr 1.0.0. How to interactively debug?

Created on 8 Jun 2020  路  1Comment  路  Source: tidyverse/dplyr

I've noticed that recover = TRUE is now seemingly unusable with dplyr.

For example this code

library(tidyverse)

mtc <- mtcars

set.seed(1111)
rand_na <- function(vec) {
  index_na  <- rdunif(15, a = 1, b = length(vec))
  vec[index_na] <- NA
  vec
}

mtc$cyl <- rand_na(mtcars$cyl)

mtc %>%
  group_by(carb) %>%
  nest() %>%
  mutate(model = map(data, ~lm(mpg ~ cyl + hp + wt,
                               data = .x)))

Produces this stack of frames:

Enter a frame number, or 0 to exit   

 1: mtc %>% group_by(carb) %>% nest() %>% mutate(model = map(data, ~lm(mpg ~ cy
 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 3: eval(quote(`_fseq`(`_lhs`)), env, env)
 4: eval(quote(`_fseq`(`_lhs`)), env, env)
 5: `_fseq`(`_lhs`)
 6: freduce(value, `_function_list`)
 7: withVisible(function_list[[k]](value))
 8: function_list[[k]](value)
 9: mutate(., model = map(data, ~lm(mpg ~ cyl + hp + wt, data = .x)))
10: mutate.data.frame(., model = map(data, ~lm(mpg ~ cyl + hp + wt, data = .x))
11: mutate_cols(.data, ...)
12: tryCatch({
    for (i in seq_along(dots)) {
        not_named <- (is.null(d
13: tryCatchList(expr, classes, parentenv, handlers)
14: tryCatchOne(expr, names, parentenv, handlers[[1]])
15: value[[3]](cond)
16: stop_dplyr(i, dots, fn = "mutate", problem = conditionMessage(e), parent = 
17: abort(bullets, class = "dplyr_error", error_name = error_name, error_expres
18: signal_abort(cnd)

And I haven't been able to actually find the call to lm burried in there, since I seem to hit some c code in mutate_cols that breaks the chain.

Just shy of a year ago I made a video of myself debugging this error with recover. So we can look at what the stack of frames was then:

image

And we can see, I had access to lm and lm.fit, where the error is occurring.

IMHO this is a real shame as recover was a really efficient technique for jumping to the source of an error and tinkering with offending code to understand why it was breaking. In the latest dplyr we do get a handy error message:

Error: Problem with `mutate()` input `model`.
x 0 (non-NA) cases
i Input `model` is `map(data, ~lm(mpg ~ cyl + hp + wt, data = .x))`.
i The error occured in group 6: carb = 8.
Run `rlang::last_error()` to see where the error occurred.

Which points me to the group that caused the problem. But I still have to write code to go inspect that group to figure out what is going on, e.g. mtc %>% filter(carb == 8). In this case it might be enough, but in more complex cases, like the second error I debug in the video, and the case I just had today it won't be.

I'm looking at writing a lot of code, overloading other people's functions, and messing around with environments to get to a situation where I can debug an error right at the point it fails.

I had a look at rlang::last_error and it seems like it almost has all the pieces I'd need to re-run the offending function with debugOnce. So that got me thinking that maybe rlang could bring this facility back with something like rlang::debug_last_error. That might be jumping the gun though. Maybe I am just stuck in the old way of doing things.

Is there another pathway to get to interactive debugging with dplyr, like what I was doing with recover?

bug wip

Most helpful comment

We should rethrow errors from a calling handler rather than an exiting handler. This way the stack is still around when it reaches the global handlers like options(error = ).

>All comments

We should rethrow errors from a calling handler rather than an exiting handler. This way the stack is still around when it reaches the global handlers like options(error = ).

Was this page helpful?
0 / 5 - 0 ratings