Given this sample data:
dT<-structure(list(A = c("a1", "a2", "a1", "a1", "a2", "a1", "a1",
"a2", "a1"), B = c("b2", "b2", "b2", "b1", "b2", "b2", "b1",
"b2", "b1"), ID = c("3", "4", "3", "1", "4", "3", "1", "4", "1"
), E = c(0.621142094943352, 0.742109450696123, 0.39439152996948,
0.40694392882818, 0.779607277916503, 0.550579323666347, 0.352622183880119,
0.690660491345867, 0.23378944873769)), class = c("data.table",
"data.frame"), row.names = c(NA, -9L))
this code works to create several variables from the variable E as expected:
library(data.table)
dcast(dT, A + B + ID ~ paste0("E", rowid(ID)))
# A B ID E1 E2 E3
#1 a1 b1 1 0.4069439 0.3526222 0.2337894
#2 a1 b2 3 0.6211421 0.3943915 0.5505793
#3 a2 b2 4 0.7421095 0.7796073 0.6906605
However, when I apply the same code to a larger dataset - available here, which is the actual data to which I want to apply the operation, data.table does not give the expected output as illustrated here:
library(readr)
mydata <- read_csv("mydata.csv")
library(data.table)
myDT<-dcast(mydata, A + B + ID ~ paste0("E", rowid(ID)))
View(myDT)
Thanks in advance.
You said "as illustrated here" but the link doesn't illustrate what's wrong. What's wrong with the output?
Without a reproducible example, this sounds like a user error -- are you sure the larger data is as regular as you think it is?
Is the problem that mydata is not a data.table?
You have loaded the data using readr which loads in as a data.frame. When using dcast, it'll run reshape2::dcast. The issue has nothing to do with data.table AFAICT. And checking the results of both reshape2::dcast and with data.table's dcast after converting the original object to data.table, I get identical results:
require(readr)
# if reshape2 exists and input to dcast is data.frame, data.table automatically calls reshape2::dcast
require(data.table)
df <- read_csv("mydata.csv")
dt <- fread("mydata.csv")
df_ans <- dcast(df, A + B + ID ~ paste0("E", rowid(ID)))
dt_ans <- dcast(dt, A + B + ID ~ paste0("E", rowid(ID)))
setkey(dt_ans, NULL)
all.equal(dt_ans, as.data.table(df_ans))
# [1] TRUE
Please provide a clear minimal reproducible example where your data is a data.table, and point out what the expected output is and what the difference is with the output you obtain.
Hi, @arunsrinivasan. You are right. I loaded the data using readr which as you say loads in as a data.frame. Importantly, I think the fact that # if reshape2 exists and input to dcast is data.frame, data.table automatically calls reshape2::dcast was what was causing the problem. By acknowledging that, I have been able to solve the problem. Thanks a lot for this tremendous help.
Most helpful comment
Hi, @arunsrinivasan. You are right. I loaded the data using
readrwhich as you say loads in as a data.frame. Importantly, I think the fact that# if reshape2 exists and input to dcast is data.frame, data.table automatically calls reshape2::dcastwas what was causing the problem. By acknowledging that, I have been able to solve the problem. Thanks a lot for this tremendous help.