Data.table: fifelse did not work in recursive function

Created on 15 Jun 2020  路  4Comments  路  Source: Rdatatable/data.table

I am trying to use data.table::fifelse to implement a recursive function to calculate greatest common divisors mentioned in this StackOverflow. However, I found that in this case, data.table::fifelse cannot be a replacement for base::ifelse.

Minimal reproducible

# use base::ifelse
gcd_base <- function(x,y) {
  r <- x%%y;
  return(ifelse(r, gcd_base(y, r), y))
}

gcd_dt <- function(x,y) {
  r <- x%%y;
  return(data.table::fifelse(r, gcd_dt(y, r), y))
}

gcd_base(10, 1)
# [1] 1
gcd_dt(10, 1)
# Error: C stack usage  7971780 is too close to the limit

Any insights? Thanks!

Most helpful comment

Thanks for the MRE! Not sure why it's recursing so much in the first place...

# injecting a print statement to check how deep in recursion we are
n_base = 0 
gcd_base <- function(x,y) {
  cat('Recursion level:', n_base <<- n_base + 1, '\n')
  r <- x%%y;
  return(ifelse(r, gcd_base(y, r), y))
}

n_dt = 0 
gcd_dt <- function(x,y) {
  cat('Recursion level:', n_dt <<- n_dt + 1, '\n')
  r <- x%%y;
  return(data.table::fifelse(r, gcd_dt(y, r), y))
}
gcd_base(10, 1)
# Recursion level: 1 
# [1] 1
gcd_dt(10, 1)
# Recursion level: 1 
# Recursion level: 2 
# Recursion level: 3 
#  ...
# Recursion level: 865 
# Recursion level: 866 
# Recursion level: 867 
# Error: C stack usage  7971728 is too close to the limit

All 4 comments

Thanks for the MRE! Not sure why it's recursing so much in the first place...

# injecting a print statement to check how deep in recursion we are
n_base = 0 
gcd_base <- function(x,y) {
  cat('Recursion level:', n_base <<- n_base + 1, '\n')
  r <- x%%y;
  return(ifelse(r, gcd_base(y, r), y))
}

n_dt = 0 
gcd_dt <- function(x,y) {
  cat('Recursion level:', n_dt <<- n_dt + 1, '\n')
  r <- x%%y;
  return(data.table::fifelse(r, gcd_dt(y, r), y))
}
gcd_base(10, 1)
# Recursion level: 1 
# [1] 1
gcd_dt(10, 1)
# Recursion level: 1 
# Recursion level: 2 
# Recursion level: 3 
#  ...
# Recursion level: 865 
# Recursion level: 866 
# Recursion level: 867 
# Error: C stack usage  7971728 is too close to the limit

Base ifelse() evaluates yes and no only if either condition is applicable. This is just a snippet of ifelse which is applicable if the length of test is greater than 1:

    ans <- test
    len <- length(ans)
    ypos <- which(test)
    npos <- which(!test)
    if (length(ypos) > 0L) 
        ans[ypos] <- rep(yes, length.out = len)[ypos]
    if (length(npos) > 0L) 
        ans[npos] <- rep(no, length.out = len)[npos]

For fifelse(), the yes or no appear to be evaluated during the call to the C API - I recompiled the fifelse with Rprintf() at the top of the function and it never printed anything. So in this case, we have a perpetual loop because gcd_dt() will keep calling itself.

We could address it by incorporating some of ifelse() into the fifelse() wrapper when the length of test is 1:

fifelse2 = function(test, yes, no, na) {
  if (is.atomic(test)) {
    if (typeof(test) != "logical")
      storage.mode(test) = "logical"
    if (length(test) == 1L) {
      if (is.na(test)) {
        if (missing(na)) 
          return (NA)
        else 
          return(na)
      }
      if (test) 
        return(yes)
      else 
        return(no)
    } else {
      .Call(data.table:::CfifelseR, test, yes, no, na)
    } 
  } else {
    stop("Argument 'test' must be logical")
  }
}

n_dt2 = 0 
gcd_dt2 <- function(x,y) {
  cat('Recursion level:', n_dt <<- n_dt + 1, '\n')
  r <- x%%y
  return(fifelse2(r, gcd_dt(y, r), y))
}

gcd_dt2(10, 1)

##Recursion level: 1
##[1] 1

Note, if this were implemented, this would not check the type of yes, no, and na, so to allow for a recursive function we would likely miss out on that restriction. Also, for recursive functions that take a vector, we would also get similar errors.

Alternatively we would have to substitute input, pass together with parent.frame to C, unevaluated, which complicates a lot. I don't like to use lazy evaluation when it is not really necessary.

dplyr recently had two similar issues:

https://github.com/tidyverse/dplyr/issues/5341
https://github.com/tidyverse/dplyr/issues/5321

Both issues were closed due to concerns for type stability (e.g., both yes and no need evaluated to make sure they are both integer results). I checked, dplyr::if_else also errors out using this. I did not report up to dplyr due to those recent issues being closed.

Regardless, I believe we have some data.table issues which mention optimizing to automatically use fifelse. This use case would not work if ifelse were optimized within data.table.

Was this page helpful?
0 / 5 - 0 ratings