Hello there,
Its me again. Sorry for posting all these requests but I am super excited about DT and I feel just a few extra features are missing to make DT absolutely great.
Something that I find quite annoyting when using := is that it operates in place by default. This is super annoying because during data work I create new variables all the time (for testing/exploring the data). This does not mean I want these variables to be added to my dataframe!
I know I can do something like
annoyingfunc <- function(DT){
DT <- copy(DT)
DT[, mytempvar := 1][]
}
but this is really not efficient at all. Could it be possible to add some inplace = FALSE like in Pandas so that the output of
DT[, newvar := 'hello', inplace = FALSE][]
prints BUT my DT is not modified? What do you think?
Thanks!!
Actually you could do that quite easily like:
iris = setDT(copy(iris))
iris[TRUE][ , temp_var := 1]
names(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
I understand -- I believe you could also do iris[NULL] or iris[1:.N] instead.
Alternatively, you could add the new column in j without :=:
iris[ , c(.SD, temp_var = 1)][1]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species temp_var
# 1: 5.1 3.5 1.4 0.2 setosa 1
ah! crazy trick! This should added to the main DT page !
thanks!
Sure thing! The above are all a bit hack-ish (besides the c(.SD, ...) approach), putting it in the FAQ/intro would mean committing to supporting & maintaining this API going forward.
So let's leave this issue open as the place to decide how we want to go about supporting affirmatively what's mentioned here.
ah ok but do you mean the safest way to do so is with the c(.SD) trick?
Safest IMO is c(.SD, list(var=val)), I recommend you to understand why the extra 'list'
thanks! I actually dont undersdant why I need the extra list here. That was working without it?!
because you're concatenating a list (.SD) with a length one element here, if you has a length 2 element you would add 2 elements to your list, which might not be what you want. If you wrap it in list you're safe
Just that that is very canonical way to select all-existing-columns-plus-one-new-column & the return is a copy of the original (not updating the original)
Just saw Jan's comment with which I agree, I was just trying to be concise but personally I also would do the c(.SD, list(var=val)) approach
oh I see. then if I have multiple columns I can just type c(.SD, list(var=val, michael = 'chirico')), right?
yep, as well as defining nontrivial columns by dropping in a full-length vector
IMO, a shallow copy elegant solves this... but can anybody tell me the reason that shallow() is not ready to be exported?
so to summarize, the best way to work with a copy without the need for the function I defined above (which is actually mentioned in the FAQ) is to use the powerful chirico[TRUE]chirico[do.some.stuff.without.consequences]
Thanks!!
[ @shrektan ] IMO, a shallow copy elegant solves this... but can anybody tell me the reason that shallow() is not ready to be exported?
Fyi, https://github.com/Rdatatable/data.table/issues/2323 links to SO where Matt's answer and comments might explain. It looks like he was thinking the more elegant way would be to change copy to act like shallow (so the latter can be removed and the user doesn't have to think about the distinction between deep and shallow copies).
Not sure but one of the reasons might be lack of reference counting mechanism in base R. Actually it is there, implemented, ready, but not enabled.
According to comments in this thread there is no need for new inplace argument, using c(.SD, ...) or (shallow) copy is a way to go, closing.
Most helpful comment
Actually you could do that quite easily like:
I understand -- I believe you could also do
iris[NULL]oriris[1:.N]instead.Alternatively, you could add the new column in
jwithout:=: