Data.table: feature request: proper inplace argument (like in Pandas)

Created on 5 Jul 2019 · 14Comments · Source: Rdatatable/data.table

Hello there,

Its me again. Sorry for posting all these requests but I am super excited about DT and I feel just a few extra features are missing to make DT absolutely great.

Something that I find quite annoyting when using := is that it operates in place by default. This is super annoying because during data work I create new variables all the time (for testing/exploring the data). This does not mean I want these variables to be added to my dataframe!

I know I can do something like

annoyingfunc <- function(DT){
  DT <- copy(DT)
  DT[, mytempvar := 1][]
}

but this is really not efficient at all. Could it be possible to add some inplace = FALSE like in Pandas so that the output of
DT[, newvar := 'hello', inplace = FALSE][]

prints BUT my DT is not modified? What do you think?
Thanks!!

Source

randomgambit

Most helpful comment

Actually you could do that quite easily like:

iris = setDT(copy(iris))

iris[TRUE][ , temp_var := 1]
names(iris)
# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

I understand -- I believe you could also do iris[NULL] or iris[1:.N] instead.

Alternatively, you could add the new column in j without :=:

iris[ , c(.SD, temp_var = 1)][1]
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species temp_var
# 1:          5.1         3.5          1.4         0.2  setosa        1

MichaelChirico on 5 Jul 2019

👍3

All 14 comments

Actually you could do that quite easily like:

iris = setDT(copy(iris))

iris[TRUE][ , temp_var := 1]
names(iris)
# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

I understand -- I believe you could also do iris[NULL] or iris[1:.N] instead.

Alternatively, you could add the new column in j without :=:

iris[ , c(.SD, temp_var = 1)][1]
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species temp_var
# 1:          5.1         3.5          1.4         0.2  setosa        1

MichaelChirico on 5 Jul 2019

👍3

ah! crazy trick! This should added to the main DT page !
thanks!

randomgambit on 5 Jul 2019

Sure thing! The above are all a bit hack-ish (besides the c(.SD, ...) approach), putting it in the FAQ/intro would mean committing to supporting & maintaining this API going forward.

So let's leave this issue open as the place to decide how we want to go about supporting affirmatively what's mentioned here.

MichaelChirico on 5 Jul 2019

ah ok but do you mean the safest way to do so is with the c(.SD) trick?

randomgambit on 5 Jul 2019

Safest IMO is c(.SD, list(var=val)), I recommend you to understand why the extra 'list'

jangorecki on 5 Jul 2019

thanks! I actually dont undersdant why I need the extra list here. That was working without it?!

randomgambit on 5 Jul 2019

because you're concatenating a list (.SD) with a length one element here, if you has a length 2 element you would add 2 elements to your list, which might not be what you want. If you wrap it in list you're safe

moodymudskipper on 5 Jul 2019

👍1

Just that that is very canonical way to select all-existing-columns-plus-one-new-column & the return is a copy of the original (not updating the original)

Just saw Jan's comment with which I agree, I was just trying to be concise but personally I also would do the c(.SD, list(var=val)) approach

MichaelChirico on 5 Jul 2019

oh I see. then if I have multiple columns I can just type c(.SD, list(var=val, michael = 'chirico')), right?

randomgambit on 5 Jul 2019

yep, as well as defining nontrivial columns by dropping in a full-length vector

MichaelChirico on 5 Jul 2019

IMO, a shallow copy elegant solves this... but can anybody tell me the reason that shallow() is not ready to be exported?

shrektan on 6 Jul 2019

so to summarize, the best way to work with a copy without the need for the function I defined above (which is actually mentioned in the FAQ) is to use the powerful chirico[TRUE]chirico[do.some.stuff.without.consequences]

Thanks!!

randomgambit on 8 Jul 2019

[ @shrektan ] IMO, a shallow copy elegant solves this... but can anybody tell me the reason that shallow() is not ready to be exported?

Fyi, https://github.com/Rdatatable/data.table/issues/2323 links to SO where Matt's answer and comments might explain. It looks like he was thinking the more elegant way would be to change copy to act like shallow (so the latter can be removed and the user doesn't have to think about the distinction between deep and shallow copies).

franknarf1 on 9 Jul 2019

👍1

Not sure but one of the reasons might be lack of reference counting mechanism in base R. Actually it is there, implemented, ready, but not enabled.
According to comments in this thread there is no need for new inplace argument, using c(.SD, ...) or (shallow) copy is a way to go, closing.

jangorecki on 30 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Regression in unique.data.table() as of data.table 1.12.0

jameslamb · 3Comments

FR: assign names to CJ() like data.table() does

franknarf1 · 3Comments

fread Unable to handle mis-quoted field if it is out-of-sample

st-pasha · 3Comments

grouping sets

jangorecki · 3Comments

Joining a keyed table on a non-keyed table is not working sometimes

symbalex · 3Comments