Data.table: Subsetting data.table with lubridate::interval columns produces "Error in dimnames(x) <- dn"

Created on 23 Mar 2020 · 10Comments · Source: Rdatatable/data.table

Hi,

I've just started trying out data.tables for a problem and have came across a strange error. When sub-setting a data.table with columns of lubridate::interval type, I get the following error:

Error in dimnames(x) <- dn : 
   length of 'dimnames' [1] not equal to array extent

Please see below for a reproducible example:

Create a data.frame with POSIXct columns and a column with the interval.

library(data.table)
library(lubridate)

df <- data.frame( start = as.POSIXct(c('2016-05-01 19:00:00', '2016-06-01 14:00:00'), tz='UTC'),
                  end = as.POSIXct(c('2016-05-03 19:00:00', '2016-06-01 22:00:00'), tz='UTC'))
df[,'intv'] <- lubridate::interval(df[,'start'], df[,'end'])

All looks okay.

df
start                 end                                             intv
1 2016-05-01 19:00:00 2016-05-03 19:00:00 2016-05-01 19:00:00 UTC--2016-05-03 19:00:00 UTC
2 2016-06-01 14:00:00 2016-06-01 22:00:00 2016-06-01 14:00:00 UTC--2016-06-01 22:00:00 UTC

Subsetting the the data.frame works as expected.

df[1,]
start                 end                                             intv
1 2016-05-01 19:00:00 2016-05-03 19:00:00 2016-05-01 19:00:00 UTC--2016-05-03 19:00:00 UTC

Convert data.frame to data.table by reference.

setDT(df)
df
start                 end                                             intv
1: 2016-05-01 19:00:00 2016-05-03 19:00:00 2016-05-01 19:00:00 UTC--2016-05-03 19:00:00 UTC
2: 2016-06-01 14:00:00 2016-06-01 22:00:00 2016-06-01 14:00:00 UTC--2016-06-01 22:00:00 UTC

Selecting a single row produces an unexpected error.

df[1,]
Error in dimnames(x) <- dn : 
   length of 'dimnames' [1] not equal to array extent

However, selecting the columns that do not contain the interval class works.

df[1,1:2]
start                 end
1: 2016-05-01 19:00:00 2016-05-03 19:00:00

Selecting the column with only the interval class produces the same error.

df[1,3]
Error in dimnames(x) <- dn : 
   length of 'dimnames' [1] not equal to array extent
df[1,'intv']
Error in dimnames(x) <- dn : 
   length of 'dimnames' [1] not equal to array extent

Someone else found the similar (related?) error and presented a work-around.

Error with durations created from a data.table using lubridate & dplyr

(durations[, duration := interval(min.date, max.date)])
# Error in `rownames<-`(`*tmp*`, value = paste(format(rn, right = TRUE),  : 
#   length of 'dimnames' [1] not equal to array extent
# In addition: Warning messages:
# 1: In unclass(e1) + unclass(e2) :
#   longer object length is not a multiple of shorter object length
# 2: In cbind(player = c("", "Aaron Brooks", "Aaron Gray", "Acie Law",  :
#   number of rows of result is not a multiple of vector length (arg 1)

Best How To :

You can try by converting the interval output to either character class (as the interval output is not a vector) or wrap with as.duration (from @Jake Fisher)

durations <- lakers.dt %>%
mutate(better.date = ymd(date)) %>%
group_by(player) %>%
summarize(min.date = min(better.date), max.date = max(better.date)) %>%
mutate(duration= as.duration(interval(min.date, max.date))
)

Or use as.vector which will coerce it to numeric class.

Session information:

sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux

Matrix products: default
BLAS: /usr/lib64/blas/openblas/libblas.so.3
LAPACK: /usr/lib64/libopenblas_haswellp-r0.3.7.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=C               LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8 lubridate_1.7.4  

loaded via a namespace (and not attached):
[1] compiler_3.5.3 magrittr_1.5   tools_3.5.3    Rcpp_1.0.4     stringi_1.4.6  stringr_1.4.0

I've also reproduced the same error on a Windows 10 machine with R v3.6.1.

Am I subsetting the data.table in the correct way? Or is this a bug?

bug non-atomic column segfault

Source

JustGitting

Most helpful comment

Yeah my bad. Drop the subsetting and you're good. Also, no need to refer to the data.table when you're inside [:

dt[, intv := Reduce(function(x, y) c(x, y), intv)]

@jangorecki the segfault is the same issue as #4166. The erroneous subsetting (which would've been fine if superfluous on the second argument...) leads to the reduce returning a two element vector.

tlapak on 7 Apr 2020

👍2

All 10 comments

You are subsetting the data.table correct way. Thank you for reporting. I am not seeing anything obvious here right now, might be related to some S4 internals. It is also reproducible on latest devel and 1.11.8.

jangorecki on 23 Mar 2020

👍1

data.table has a hard time with S4 objects it seems. In this case, it messes up (at least) the subsetting. The actual error happens later in print.data.table:

dt <- data.table(e = c(ymd('2015-01-01') %--% ymd('2017-02-12'),
                       ymd('2016-01-01') %--% ymd('2021-04-02')))

dt[1]
# Error in dimnames(x) <- dn : 
#   length of 'dimnames' [1] not equal to array extent

dt[1, e]
# 2015-01-01 UTC--2017-02-12 UTC 2016-01-01 UTC--2018-02-12 UTC

Notice that both intervals have the same length now.

I'm not very familiar with the S4 system, but what happens is roughly as follows: Interval is an object with three slots. The first one contains the length of the interval, the second slot the start and the third slot the time zone. A vector of these objects bundles together the entries of the first and second slot, respectively. (NB: It seems that it can only handle one time zone.) [.data.table only really 'sees' the first slot while the second slot looks like one of the attributes and is left alone.

The error then crops up when print.data.table is called where format inside of format.data.table recognizes the mangled interval object as two items and returns a two row matrix. When row.names == TRUE print attempts to assign row names of length one to the two row matrix. Hence the error.

Wrapping these things in list columns would be a fix to the issue.

tlapak on 28 Mar 2020

👍2

Thanks for investigating and for a workaround. Till we will have a fix for that we could probably add extra check in print.data.table to provide a meaningful error.

jangorecki on 28 Mar 2020

Thanks @tlapak

I've been trying to use the workaround, but have come across a snag. I'm happy to post this as another issue, but as it related to this current bug I've added it here. Please let me know if I should move it mods :)

I have two goals:
1) Filter the data.table using a second data.table with matching keys, based on this approach

2) Filter again the filtered data.table based on overlapping intervals using the lubridate::int_overlap() function.

For goal 1) I've found another related bug to using columns of type interval. For goal 2) there is a complication using the workaround described by @tlapak .

For 1), the example below uses sub-setting of a data.table with Interval columns, that should work if the bug was not present. This illustrates @tlapak point about how multi-slot S4 objects are not handled well by data.table. All other data.table columns are subset correctly. Whilst the interval object is not sub-setted correctly, such that the elements in the "start:" slot of the interval object are not modified at all.

To achieve goal 2), I've used the list workaround by wrapping the Interval column in a list, but I'm having trouble extracting the Interval object within the list so the int_overlap() function can compare it to another interval object.

Create the data.table with the interval class column.

library(data.table)
library(lubridate)

dt <- data.table( id = c(1, 2, 3, 4),
                  intv = c(ymd('2015-01-01') %--% ymd('2017-02-12'),
                           ymd('2016-01-01') %--% ymd('2021-04-02'),
                           ymd('2017-01-01') %--% ymd('2018-04-02'),
                           ymd('2009-01-01') %--% ymd('2010-04-02')))

We can see the structure of the dt where the "Interval" class has 3 slots with .Data: and start: both have the same length.

str(dt)
# Classes ‘data.table’ and 'data.frame':    4 obs. of  2 variables:
#   $ id  : num  1 2 3 4
# $ intv:Formal class 'Interval' [package "lubridate"] with 3 slots
# .. ..@ .Data: num  6.68e+07 1.66e+08 3.94e+07 3.94e+07
# .. ..@ start: POSIXct, format: "2015-01-01" "2016-01-01" "2017-01-01" "2009-01-01"
# .. ..@ tzone: chr "UTC"
# - attr(*, ".internal.selfref")=<externalptr>

We'll add keys to dt and create another data.table for subsetting dt. We'll only select the rows with ids = 1, 2.

setkey(dt, id)
# subset with another dt.
dt_ids <- data.table( id = c(1, 2) )
setkey(dt_ids, id)

dt_filtered <- dt[dt_ids]

No complaints so far, but if we look at the structure of dt_filtered, we can see that the .Data: and start: have inconsistent lengths.

str(dt_filtered)
# Classes ‘data.table’ and 'data.frame':    2 obs. of  2 variables:
#   $ id  : num  1 2
# $ intv:Formal class 'Interval' [package "lubridate"] with 3 slots
# .. ..@ .Data: num  6.68e+07 1.66e+08
# .. ..@ start: POSIXct, format: "2015-01-01" "2016-01-01" "2017-01-01" "2009-01-01"
# .. ..@ tzone: chr "UTC"
# - attr(*, "sorted")= chr "id"
# - attr(*, ".internal.selfref")=<externalptr>

This becomes a problem when we try to filter dt_filtered using the int_overlaps() function resulting in the error " invalid class “Interval” object: Inconsistent lengths: spans = 2, start dates = 4" as illustrated below:

my_intv <- ymd('2016-01-01') %--% ymd('2021-04-02')

# Subset dt_filtered by selecting overlapping intervals.
dt_filtered[ ,lubridate::int_overlaps( ..my_intv, intv)]
# Error in validObject(.Object) : 
#   invalid class “Interval” object: Inconsistent lengths: spans = 2, start dates = 4

Let's try again with the as.list() workaround.

dt <- data.table( id = c(1, 2, 3, 4),
                  intv = c(ymd('2015-01-01') %--% ymd('2017-02-12'),
                           ymd('2016-01-01') %--% ymd('2021-04-02'),
                           ymd('2017-01-01') %--% ymd('2018-04-02'),
                           ymd('2009-01-01') %--% ymd('2010-04-02')))

dt[ , intv := as.list(intv)] # cast intv as a list.

dt
# id                           intv
# 1:  1 2015-01-01 UTC--2017-02-12 UTC
# 2:  2 2016-01-01 UTC--2021-04-02 UTC
# 3:  3 2017-01-01 UTC--2018-04-02 UTC
# 4:  4 2009-01-01 UTC--2010-04-02 UTC

The 'intv' column is a list of 4 element (each element being an Interval class)

str(dt)
# Classes ‘data.table’ and 'data.frame':    4 obs. of  2 variables:
#   $ id  : num  1 2 3 4
# $ intv:List of 4
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 39398400
# .. .. ..@ start: POSIXct, format: "2017-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 39398400
# .. .. ..@ start: POSIXct, format: "2009-01-01"
# .. .. ..@ tzone: chr "UTC"
# - attr(*, ".internal.selfref")=<externalptr>

Lets subset.

setkey(dt, id)
# subset with another dt.
dt_ids <- data.table( id = c(1, 2) )
setkey(dt_ids, id)

# Filtering dt using another data.table.
dt_filtered <- dt[dt_ids]
str(dt_filtered)
# Classes ‘data.table’ and 'data.frame':    2 obs. of  2 variables:
#   $ id  : num  1 2
# $ intv:List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# - attr(*, "sorted")= chr "id"
# - attr(*, ".internal.selfref")=<externalptr>

As seen above, dt has been filtered correctly. However, the problem I'm having is because each Interval object is encapsulated in a list. So that the int_overlaps() function reports that one of it's fields is not an Interval.

my_intv <- ymd('2016-01-01') %--% ymd('2021-04-02')

# Subset dt_filtered by selecting overlapping intervals.
dt_filtered[ ,lubridate::int_overlaps( ..my_intv, intv)]
# Error in lubridate::int_overlaps(..my_intv, intv) : 
#   c(is.interval(int1), is.interval(int2)) are not all TRUE

The above error is expected, so I've tried to extract the list element using '[[]]' notation, but it strips the class information. For example I'll add a new column 'intv_new':

dt_filtered[, intv_new := mapply('[[', intv, 1, SIMPLIFY = T)]
str(dt_filtered)
# Classes ‘data.table’ and 'data.frame':    2 obs. of  3 variables:
#   $ id      : num  1 2
# $ intv    :List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# $ intv_new: num  6.68e+07 1.66e+08
# - attr(*, "sorted")= chr "id"
# - attr(*, ".internal.selfref")=<externalptr>

The intv_new is of type numeric, not Interval.

Here is another approach (basically the same...but worth a shot).

dt_filtered[, intv_new := sapply(intv, function(x) x[[1]], simplify = T)]
str(dt_filtered)

# Classes ‘data.table’ and 'data.frame':    2 obs. of  3 variables:
#   $ id      : num  1 2
# $ intv    :List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# $ intv_new: num  6.68e+07 1.66e+08
# - attr(*, "sorted")= chr "id"
# - attr(*, ".internal.selfref")=<externalptr>

Whilst, creating a list of intervals and then extracting an element is trivial.

intv_list <- list(ymd('2015-01-01') %--% ymd('2017-02-12'),
  ymd('2016-01-01') %--% ymd('2021-04-02'),
  ymd('2017-01-01') %--% ymd('2018-04-02'),
  ymd('2009-01-01') %--% ymd('2010-04-02'))

intv_list[[1]]
# [1] 2015-01-01 UTC--2017-02-12 UTC
class(intv_list[[1]])
# [1] "Interval"
# attr(,"package")
# [1] "lubridate"

I'm guessing my data.table syntax is wrong somewhere.

How can an element from a list column be extracted and passed to another function without loosing the class information/structure of the original object?

I'm guessing the solution could be something like this format:

dt_filtered[ ,lubridate::int_overlaps( ..my_intv, sapply(intv, function(x) x[1]))]
Error in lubridate::int_overlaps(..my_intv, sapply(intv, function(x) x[1])) : 
  c(is.interval(int1), is.interval(int2)) are not all TRUE

...but does not work as hoped.

Again thanks for your guidance.

JustGitting on 6 Apr 2020

@JustGitting thank you for providing your workarounds. Just note that using variable name ..my_intv is a little bit risky. We are planning to provide dot dot prefix (..var) to allow looking up symbol from a character variable in parent scope. When it will be done, it would invalidate your code. See for example #3199
fyi @mattdowle

jangorecki on 6 Apr 2020

Don't use simplify = True and you're good. Setting that argument to true calls simplify2array, which eats attributes for breakfast. And slots 2+ of S4 objects are just that, attributes. Btw. sapply automatically has simplify set to true. So just use lapply and drop the argument.

If you ever need to get the intervals out of the list back into a "vector" you can use

Reduce(function(x, y) c(x[[1]], y), dt[, intv])

You're also a bit lucky that lubridate implements an as.list method for its interval class. You can see from its definition what you'd need to do in the general case:

as.list.Interval <- function(x, ...) {
  lapply(seq_along(x), function(i) x[[i]])
}

Edit: in your case I'd just leave the start and end points as separate columns and only create the interval object when you need them. They don't really give you anything

tlapak on 6 Apr 2020

👍2

Thanks @tlapak again!

Don't use simplify = True and you're good. Setting that argument to true calls simplify2array, which eats attributes for breakfast. And slots 2+ of S4 objects are just that, attributes. Btw. sapply automatically has simplify set to true. So just use lapply and drop the argument.

I've tried lapply, mapply and sapply with simplify = F, but it only returns a list of Intervals..not a interval with multiple entries. I'm not sure what I'm doing wrong here...

dt_filtered[, intv_new := lapply(intv, function(x) x[[1]])]
str(dt_filtered)
# Classes ‘data.table’ and 'data.frame':    2 obs. of  3 variables:
#   $ id      : num  1 2
# $ intv    :List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# $ intv_new:List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# - attr(*, "sorted")= chr "id"
# - attr(*, ".internal.selfref")=<externalptr>

dt_filtered[, intv_new := mapply('[[', intv, 1, SIMPLIFY = F)]
str(dt_filtered)
# Classes ‘data.table’ and 'data.frame':    2 obs. of  3 variables:
#   $ id      : num  1 2
# $ intv    :List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# $ intv_new:List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# - attr(*, "sorted")= chr "id"
# - attr(*, ".internal.selfref")=<externalptr>

dt_filtered[, intv_new := sapply(intv, function(x) x[[1]], simplify = F)]
str(dt_filtered)
# Classes ‘data.table’ and 'data.frame':    2 obs. of  3 variables:
#   $ id      : num  1 2
# $ intv    :List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# $ intv_new:List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# - attr(*, "sorted")= chr "id"
# - attr(*, ".internal.selfref")=<externalptr>

Reduce(function(x, y) c(x[[1]], y), dt[, intv])

Your suggestion does the trick:

dt_filtered[, intv_new := Reduce(function(x, y) c(x[[1]], y), dt[, intv]) ]
str(dt_filtered)
# Classes ‘data.table’ and 'data.frame':    2 obs. of  3 variables:
#   $ id      : num  1 2
# $ intv    :List of 2
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 66787200
# .. .. ..@ start: POSIXct, format: "2015-01-01"
# .. .. ..@ tzone: chr "UTC"
# ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
# .. .. ..@ .Data: num 1.66e+08
# .. .. ..@ start: POSIXct, format: "2016-01-01"
# .. .. ..@ tzone: chr "UTC"
# $ intv_new:Formal class 'Interval' [package "lubridate"] with 3 slots
# .. ..@ .Data: num  66787200 39398400
# .. ..@ start: POSIXct, format: "2015-01-01" "2009-01-01"
# .. ..@ tzone: chr "UTC"
# - attr(*, "sorted")= chr "id"
# - attr(*, ".internal.selfref")=<externalptr>

Edit: in your case I'd just leave the start and end points as separate columns and only create the interval object when you need them. They don't really give you anything

Yes, I had originally used start/end datetime columns, but wanted to use the interval object as it is a single column. This simplifies my function as I don't need to specify which columns are the start/end.
I could split the interval into start/end datetime columns inside the function I'm writing, until the bug is fixed, but that's more complication.

JustGitting on 7 Apr 2020

Unfortunately I keep getting segfaults using the Reduce() approach if updating an existing column of a data.table.

library(data.table)
library(lubridate)

dt <- data.table( id = c(1, 2, 3, 4),
                  intv = c(ymd('2015-01-01') %--% ymd('2017-02-12'),
                           ymd('2016-01-01') %--% ymd('2021-04-02'),
                           ymd('2017-01-01') %--% ymd('2018-04-02'),
                           ymd('2009-01-01') %--% ymd('2010-04-02')))

dt[ , intv := as.list(intv)]
str(dt) 
Classes ‘data.table’ and 'data.frame':  4 obs. of  2 variables:
 $ id  : num  1 2 3 4
 $ intv:List of 4
  ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
  .. .. ..@ .Data: num 66787200
  .. .. ..@ start: POSIXct, format: "2015-01-01"
  .. .. ..@ tzone: chr "UTC"
  ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
  .. .. ..@ .Data: num 1.66e+08
  .. .. ..@ start: POSIXct, format: "2016-01-01"
  .. .. ..@ tzone: chr "UTC"
  ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
  .. .. ..@ .Data: num 39398400
  .. .. ..@ start: POSIXct, format: "2017-01-01"
  .. .. ..@ tzone: chr "UTC"
  ..$ :Formal class 'Interval' [package "lubridate"] with 3 slots
  .. .. ..@ .Data: num 39398400
  .. .. ..@ start: POSIXct, format: "2009-01-01"
  .. .. ..@ tzone: chr "UTC"
 - attr(*, ".internal.selfref")=<externalptr>

WARNING: this will kill your R session.

The segfault does not consistently occur at the some position. It can happen either during the Reduce operation, or when printing the data.tables contents.


# Convert intv back to a lubridate::interval
dt[, intv := Reduce(function(x, y) c(x[[1]], y), dt[, intv]) ] # R session aborted (sometimes).
dt # R session aborted.

The segfault does not happen if you create or update a new column, which is why it took me a while to figure out what was happening. :(

JustGitting on 7 Apr 2020

Thanks, I am able to reproduce segfault on recent devel.

jangorecki on 7 Apr 2020

Yeah my bad. Drop the subsetting and you're good. Also, no need to refer to the data.table when you're inside [:

dt[, intv := Reduce(function(x, y) c(x, y), intv)]

tlapak on 7 Apr 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

shift() in data.table v1.9.6 is slow for many groups

nachti · 3Comments

Join with index delivers unexpected results if indexed column name is a prefix of the join column name

pannnda · 3Comments

Regression in unique.data.table() as of data.table 1.12.0

jameslamb · 3Comments

FR: assign names to CJ() like data.table() does

franknarf1 · 3Comments

Joining a keyed table on a non-keyed table is not working sometimes

symbalex · 3Comments