dplyr_group_indices crashes R with malformed arguments

Created on 29 Sep 2020  路  7Comments  路  Source: tidyverse/dplyr

I'm sure there is a smaller reprex possible, but this is a problem reduced from tidyverts/tsibble#194

The code below will crash R

gr <- structure(list(1:14566, 14567:30980, 30981:48498, 48499:66037),
                ptype = integer(0), 
                class = c("vctrs_list_of","vctrs_vctr", "list"))
nr <- 0L
.Call(dplyr:::dplyr_group_indices, gr, nr)
bug

All 7 comments

Can you create a reprex with the public interface please? We don't support calling the unexported interface with malformed arguments.

@lionel-

sure. Is the following better? The call to group_indices has to be repeated once or twice sometimes to get the crash to happen.

res <-
  structure(list(
    Sensor = character(0), Date_Time = structure(numeric(0), tzone = "Australia/Melbourne", class = c(
      "POSIXct",
      "POSIXt"
    )), Date = structure(numeric(0), class = "Date"), Time = integer(0),
    Count = integer(0)
  ), row.names = integer(0), class = c(
    "grouped_ts",
    "grouped_df", "tbl_ts", "tbl_df", "tbl", "data.frame"
  ), key = structure(list(
    Sensor = c(
      "Birrarung Marr", "Bourke Street Mall (North)",
      "QV Market-Elizabeth St (West)", "Southern Cross Station"
    ), .rows = structure(list(
      1:14566, 14567:30980, 30981:48498,
      48499:66037
    ), ptype = integer(0), class = c(
      "vctrs_list_of",
      "vctrs_vctr", "list"
    ))
  ), row.names = c(NA, 4L), class = c(
    "tbl_df",
    "tbl", "data.frame"
  ), .drop = TRUE), index = structure("Date_Time", ordered = TRUE), index2 = "Date_Time", interval = structure(list(
    year = 0, quarter = 0, month = 0, week = 0, day = 0, hour = 1,
    minute = 0, second = 0, millisecond = 0, microsecond = 0,
    nanosecond = 0, unit = 0
  ), .regular = TRUE, class = c(
    "interval",
    "vctrs_rcrd", "vctrs_vctr"
  )), groups = structure(list(Sensor = c(
    "Birrarung Marr",
    "Bourke Street Mall (North)", "QV Market-Elizabeth St (West)",
    "Southern Cross Station"
  ), .rows = structure(list(
    1:14566, 14567:30980,
    30981:48498, 48499:66037
  ), ptype = integer(0), class = c(
    "vctrs_list_of",
    "vctrs_vctr", "list"
  ))), row.names = c(NA, 4L), class = c(
    "tbl_df",
    "tbl", "data.frame"
  ), .drop = TRUE))


dplyr::group_indices(res)
dplyr::group_indices(res)

Thank you. You are still creating data structures without using the public interface. Can you use constructors please?

After trying to simplify it, I indeed found what I believe to be the root cause. tsibble:::as_tibble.grouped_ts was not using the grouped_df constructor to create a grouped_df.

https://github.com/tidyverts/tsibble/blob/c6e343d63eda12732cb3a6fc0205e4a0c85b871f/R/as-tsibble.R#L515-L518

This is a contrived (non)-example, but here is essentially what caused the error using constructors inappropriately:

df <-
  tibble::new_tibble(
    data.frame(x=integer(), y = integer()),
    groups = data.frame(x=0, .rows = vctrs::list_of(1:1000)),
    nrow = 0,
    class = "grouped_df")
dplyr::group_indices(df)

Thanks a lot for debugging this @TylerGrantSmith!

@earowang Could you use the exported constructor to build a grouped-df please? Only the public constructor can guarantee consistent data. Please let me know if you run into other issues.

I think the issue is that new_grouped_df() as the public constructor doesn't check the consistency of the data inputs and groups. (edit: this inconsistency is perhaps needed for filter(.preserve = TRUE))

Need to run the following snippet a couple of times to get the error.

df <- dplyr::new_grouped_df(
  data.frame(x=integer(), y = integer()),
  groups = data.frame(x=0, .rows = vctrs::list_of(1:1000))
)
dplyr::group_indices(df)
#> integer(0)

Created on 2020-10-01 by the reprex package (v0.3.0)

This is what's happening:

  SEXP indices = PROTECT(Rf_allocVector(INTSXP, nr));
  int* p_indices = INTEGER(indices);

this segfaults when nr is 0.

Was this page helpful?
0 / 5 - 0 ratings