I am trying to use data.table::fifelse to implement a recursive function to calculate greatest common divisors mentioned in this StackOverflow. However, I found that in this case, data.table::fifelse cannot be a replacement for base::ifelse.
# use base::ifelse
gcd_base <- function(x,y) {
r <- x%%y;
return(ifelse(r, gcd_base(y, r), y))
}
gcd_dt <- function(x,y) {
r <- x%%y;
return(data.table::fifelse(r, gcd_dt(y, r), y))
}
gcd_base(10, 1)
# [1] 1
gcd_dt(10, 1)
# Error: C stack usage 7971780 is too close to the limit
Any insights? Thanks!
Thanks for the MRE! Not sure why it's recursing so much in the first place...
# injecting a print statement to check how deep in recursion we are
n_base = 0
gcd_base <- function(x,y) {
cat('Recursion level:', n_base <<- n_base + 1, '\n')
r <- x%%y;
return(ifelse(r, gcd_base(y, r), y))
}
n_dt = 0
gcd_dt <- function(x,y) {
cat('Recursion level:', n_dt <<- n_dt + 1, '\n')
r <- x%%y;
return(data.table::fifelse(r, gcd_dt(y, r), y))
}
gcd_base(10, 1)
# Recursion level: 1
# [1] 1
gcd_dt(10, 1)
# Recursion level: 1
# Recursion level: 2
# Recursion level: 3
# ...
# Recursion level: 865
# Recursion level: 866
# Recursion level: 867
# Error: C stack usage 7971728 is too close to the limit
Base ifelse() evaluates yes and no only if either condition is applicable. This is just a snippet of ifelse which is applicable if the length of test is greater than 1:
ans <- test
len <- length(ans)
ypos <- which(test)
npos <- which(!test)
if (length(ypos) > 0L)
ans[ypos] <- rep(yes, length.out = len)[ypos]
if (length(npos) > 0L)
ans[npos] <- rep(no, length.out = len)[npos]
For fifelse(), the yes or no appear to be evaluated during the call to the C API - I recompiled the fifelse with Rprintf() at the top of the function and it never printed anything. So in this case, we have a perpetual loop because gcd_dt() will keep calling itself.
We could address it by incorporating some of ifelse() into the fifelse() wrapper when the length of test is 1:
fifelse2 = function(test, yes, no, na) {
if (is.atomic(test)) {
if (typeof(test) != "logical")
storage.mode(test) = "logical"
if (length(test) == 1L) {
if (is.na(test)) {
if (missing(na))
return (NA)
else
return(na)
}
if (test)
return(yes)
else
return(no)
} else {
.Call(data.table:::CfifelseR, test, yes, no, na)
}
} else {
stop("Argument 'test' must be logical")
}
}
n_dt2 = 0
gcd_dt2 <- function(x,y) {
cat('Recursion level:', n_dt <<- n_dt + 1, '\n')
r <- x%%y
return(fifelse2(r, gcd_dt(y, r), y))
}
gcd_dt2(10, 1)
##Recursion level: 1
##[1] 1
Note, if this were implemented, this would not check the type of yes, no, and na, so to allow for a recursive function we would likely miss out on that restriction. Also, for recursive functions that take a vector, we would also get similar errors.
Alternatively we would have to substitute input, pass together with parent.frame to C, unevaluated, which complicates a lot. I don't like to use lazy evaluation when it is not really necessary.
dplyr recently had two similar issues:
https://github.com/tidyverse/dplyr/issues/5341
https://github.com/tidyverse/dplyr/issues/5321
Both issues were closed due to concerns for type stability (e.g., both yes and no need evaluated to make sure they are both integer results). I checked, dplyr::if_else also errors out using this. I did not report up to dplyr due to those recent issues being closed.
Regardless, I believe we have some data.table issues which mention optimizing to automatically use fifelse. This use case would not work if ifelse were optimized within data.table.
Most helpful comment
Thanks for the MRE! Not sure why it's recursing so much in the first place...