See code and error below.
tmp = structure(list(A = c(7.699, 7.725, 7.621, 7.647, 7.621,
7.664, 7.629, 7.559, 7.551, 7.341),
B = c(7.835, 7.873, 7.812, 7.862, 7.866,
7.9, 7.85, 7.804, 7.8, 7.626),
C = c(7.831, 7.875, 7.815, 7.854, 7.858,
7.872, 7.833, 7.783, 7.794, 7.675
)), .Names = c("A", "B", "C"),
class = c("data.table", "data.frame"),
row.names = c(NA, -10L))
for (i in c("A", "B", "C"))
set(tmp, j=paste0(i, ".chg"), value=c(NA, diff(tmp[[i]]) / 100))
# Error in set(tmp, j = paste0(i, ".chg"), value = c(NA, diff(tmp[[i]])/100)) :
# Internal error, please report (including result of sessionInfo()) to datatable-help: oldtncol (0) < oldncol (3) but tl of class is marked.
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doParallel_1.0.8 iterators_1.0.7 foreach_1.4.2 ggplot2_1.0.0 reshape2_1.4 data.table_1.9.2
loaded via a namespace (and not attached):
[1] codetools_0.2-8 colorspace_1.2-4 digest_0.6.4 grid_3.1.1 gtable_0.1.2 MASS_7.3-33 munsell_0.4.2 plyr_1.8.1
[9] proto_0.3-10 Rcpp_0.11.2 scales_0.2.4 stringr_0.6.2 tools_3.1.1
@jrowen, thanks. Just to be clear, you created the data.table in the same manner you've shown.. i.e., using structure..?
Correct, if I run the lines above, I receive the noted error each time. This is a subset of the original dataset but still generates the error.
Okay good. Why do you use structure? Why not data.table(.)? structure(.) does not _over-allocate_ columns by default, unlike data.table(.). Since there's no _over-allocation_ set fails.
If you use :=, it'd detect automatically and try to recover the issue by over-allocating again, with a warning. But set doesn't have it implemented because it's designed to have as less overhead as possible for looping scenarios.
Sorry, I wasn't clear in my earlier post. I used dput to output a subset of the original data.table that was generating the error (hence structure call). I used data.table(read.csv(.)) to create the original object (poorly formatted input file, so fread didn't work). The for loop noted above actually crashes my rsession when called on the original object. Let me know if you have any additional questions.
I see. That's a bit tricky then. I think we'd require the CSV file (minimal data on which you're able to reproduce this is sufficient) and the _exact_ commands that got you this error..
Because using structure() _will_ result in this error, and it's not surprising. And if so, then whether this behaviour should be fixed in set() or not is another issue.. But if it is, as you say, without using structure, then the issue lies somewhere, which is hard to track down without the actual set of commands...
Thanks again.
Would an rds file work?
That depends. How do you obtain tmp from your original data? Could you paste the code here?
For ex:
tmp = read.csv(...) # (1)
# or
tmp = readRDS(..) # (2)
If you did (1), then we'll need .csv, and if you did (2), then we'll need the .rds file. We need the file in the exact format you load to get that error.
Because, if you loaded from rdata or rds, then it's more or less a known issue.
Here is the exact code that is crashing my rsession
library(data.table)
tmp = readRDS("c:/temp/tmp.rds")
for (i in c("A", "B", "C"))
set(tmp, j=paste0(i, ".chg"), value=c(NA, diff(tmp[[i]]) / 100))
Let me know where to send or how to attach the rds files.
A dropbox link? In any case, this happens for the same reason explained in my second reply. set() doesn't check/over-allocate columns like :=. But I'll keep this open. Maybe it's not costly to implement this within set().
Thanks again for the report and followups.
In the meanwhile you can do:
tmp[, paste0(names(tmp), ".chg") := lapply(.SD, function(x) c(NA, diff(x))/ 100)]
# A B C A.chg B.chg C.chg
# 1: 7.699 7.835 7.831 NA NA NA
# 2: 7.725 7.873 7.875 0.00026 0.00038 0.00044
# 3: 7.621 7.812 7.815 -0.00104 -0.00061 -0.00060
# 4: 7.647 7.862 7.854 0.00026 0.00050 0.00039
# 5: 7.621 7.866 7.858 -0.00026 0.00004 0.00004
# 6: 7.664 7.900 7.872 0.00043 0.00034 0.00014
# 7: 7.629 7.850 7.833 -0.00035 -0.00050 -0.00039
# 8: 7.559 7.804 7.783 -0.00070 -0.00046 -0.00050
# 9: 7.551 7.800 7.794 -0.00008 -0.00004 0.00011
# 10: 7.341 7.626 7.675 -0.00210 -0.00174 -0.00119
Marking it as a bug for now.
Here's some test data
Thanks for suggestion--I found the same to work without error.
This is now nicely explained in error message, and does not crash session.
library(data.table)
data.table 1.12.3 IN DEVELOPMENT built 2019-05-16 13:45:56 UTC; jan using 2 threads (see ?getDTthreads). Latest news: r-datatable.com
tmp = readRDS("tmp.rds")
for (i in c("A", "B", "C"))
set(tmp, j=paste0(i, ".chg"), value=c(NA, diff(tmp[[i]]) / 100))
#Error in set(tmp, j = paste0(i, ".chg"), value = c(NA, diff(tmp[[i]])/100)) :
# This data.table has either been loaded from disk (e.g. using readRDS()/load()) or constructed manually (e.g. using structure()). Please run setDT() or alloc.col() on it first (to pre-allocate space for new columns) before assigning by reference to it.
if it still happens to crash session please re-open
Most helpful comment
Okay good. Why do you use
structure? Why notdata.table(.)?structure(.)does not _over-allocate_ columns by default, unlikedata.table(.). Since there's no _over-allocation_setfails.If you use
:=, it'd detect automatically and try to recover the issue by over-allocating again, with a warning. Butsetdoesn't have it implemented because it's designed to have as less overhead as possible for looping scenarios.