Data.table: internal error with set

Created on 4 Aug 2014 · 12Comments · Source: Rdatatable/data.table

See code and error below.

tmp = structure(list(A = c(7.699, 7.725, 7.621, 7.647, 7.621, 
                                 7.664, 7.629, 7.559, 7.551, 7.341), 
                     B = c(7.835, 7.873, 7.812, 7.862, 7.866, 
                                 7.9, 7.85, 7.804, 7.8, 7.626), 
                     C = c(7.831, 7.875, 7.815, 7.854, 7.858, 
                                  7.872, 7.833, 7.783, 7.794, 7.675
                     )), .Names = c("A", "B", "C"), 
                class = c("data.table", "data.frame"), 
                row.names = c(NA, -10L))
for (i in c("A", "B", "C"))
  set(tmp, j=paste0(i, ".chg"), value=c(NA, diff(tmp[[i]]) / 100))
# Error in set(tmp, j = paste0(i, ".chg"), value = c(NA, diff(tmp[[i]])/100)) : 
#  Internal error, please report (including result of sessionInfo()) to datatable-help: oldtncol (0) < oldncol (3) but tl of class is marked.

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doParallel_1.0.8 iterators_1.0.7  foreach_1.4.2    ggplot2_1.0.0    reshape2_1.4     data.table_1.9.2

loaded via a namespace (and not attached):
 [1] codetools_0.2-8  colorspace_1.2-4 digest_0.6.4     grid_3.1.1       gtable_0.1.2     MASS_7.3-33      munsell_0.4.2    plyr_1.8.1      
 [9] proto_0.3-10     Rcpp_0.11.2      scales_0.2.4     stringr_0.6.2    tools_3.1.1

bug

Source

jrowen

Most helpful comment

Okay good. Why do you use structure? Why not data.table(.)? structure(.) does not _over-allocate_ columns by default, unlike data.table(.). Since there's no _over-allocation_ set fails.

If you use :=, it'd detect automatically and try to recover the issue by over-allocating again, with a warning. But set doesn't have it implemented because it's designed to have as less overhead as possible for looping scenarios.

arunsrinivasan on 4 Aug 2014

👍2

All 12 comments

@jrowen, thanks. Just to be clear, you created the data.table in the same manner you've shown.. i.e., using structure..?

arunsrinivasan on 4 Aug 2014

Correct, if I run the lines above, I receive the noted error each time. This is a subset of the original dataset but still generates the error.

jrowen on 4 Aug 2014

Okay good. Why do you use structure? Why not data.table(.)? structure(.) does not _over-allocate_ columns by default, unlike data.table(.). Since there's no _over-allocation_ set fails.

arunsrinivasan on 4 Aug 2014

👍2

Sorry, I wasn't clear in my earlier post. I used dput to output a subset of the original data.table that was generating the error (hence structure call). I used data.table(read.csv(.)) to create the original object (poorly formatted input file, so fread didn't work). The for loop noted above actually crashes my rsession when called on the original object. Let me know if you have any additional questions.

jrowen on 4 Aug 2014

I see. That's a bit tricky then. I think we'd require the CSV file (minimal data on which you're able to reproduce this is sufficient) and the _exact_ commands that got you this error..

Because using structure() _will_ result in this error, and it's not surprising. And if so, then whether this behaviour should be fixed in set() or not is another issue.. But if it is, as you say, without using structure, then the issue lies somewhere, which is hard to track down without the actual set of commands...

Thanks again.

arunsrinivasan on 4 Aug 2014

Would an rds file work?

jrowen on 5 Aug 2014

That depends. How do you obtain tmp from your original data? Could you paste the code here?

For ex:

tmp = read.csv(...) # (1)
# or
tmp = readRDS(..) # (2)

If you did (1), then we'll need .csv, and if you did (2), then we'll need the .rds file. We need the file in the exact format you load to get that error.

Because, if you loaded from rdata or rds, then it's more or less a known issue.

arunsrinivasan on 5 Aug 2014

Here is the exact code that is crashing my rsession

library(data.table)
tmp = readRDS("c:/temp/tmp.rds")
for (i in c("A", "B", "C"))
  set(tmp, j=paste0(i, ".chg"), value=c(NA, diff(tmp[[i]]) / 100))

Let me know where to send or how to attach the rds files.

jrowen on 5 Aug 2014

A dropbox link? In any case, this happens for the same reason explained in my second reply. set() doesn't check/over-allocate columns like :=. But I'll keep this open. Maybe it's not costly to implement this within set().

Thanks again for the report and followups.

In the meanwhile you can do:

tmp[, paste0(names(tmp), ".chg") := lapply(.SD, function(x) c(NA, diff(x))/ 100)]
#        A     B     C    A.chg    B.chg    C.chg
#  1: 7.699 7.835 7.831       NA       NA       NA
#  2: 7.725 7.873 7.875  0.00026  0.00038  0.00044
#  3: 7.621 7.812 7.815 -0.00104 -0.00061 -0.00060
#  4: 7.647 7.862 7.854  0.00026  0.00050  0.00039
#  5: 7.621 7.866 7.858 -0.00026  0.00004  0.00004
#  6: 7.664 7.900 7.872  0.00043  0.00034  0.00014
#  7: 7.629 7.850 7.833 -0.00035 -0.00050 -0.00039
#  8: 7.559 7.804 7.783 -0.00070 -0.00046 -0.00050
#  9: 7.551 7.800 7.794 -0.00008 -0.00004  0.00011
# 10: 7.341 7.626 7.675 -0.00210 -0.00174 -0.00119

arunsrinivasan on 5 Aug 2014

Marking it as a bug for now.

arunsrinivasan on 5 Aug 2014

Here's some test data

Thanks for suggestion--I found the same to work without error.

jrowen on 5 Aug 2014

This is now nicely explained in error message, and does not crash session.

library(data.table)
data.table 1.12.3 IN DEVELOPMENT built 2019-05-16 13:45:56 UTC; jan using 2 threads (see ?getDTthreads).  Latest news: r-datatable.com
tmp = readRDS("tmp.rds")
for (i in c("A", "B", "C"))
  set(tmp, j=paste0(i, ".chg"), value=c(NA, diff(tmp[[i]]) / 100))
#Error in set(tmp, j = paste0(i, ".chg"), value = c(NA, diff(tmp[[i]])/100)) : 
#  This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or alloc.col() on it first (to pre-allocate space for new columns) before assigning by reference to it.

if it still happens to crash session please re-open