Data.table: Add an alias for functional version of `:=`

Created on 28 Aug 2019  ·  15Comments  ·  Source: Rdatatable/data.table

Right now when adding multiple columns to a data.table one of the options is to do:

dt[, `:=`(col1 = stuff, col2 = moar_stuff)]

I think that this syntax can be a bit confusing (too many equals) but mainly is hard to write. To get the grave accent, on my keyboard I have to press AltGr + } and then space. It's kind of annoying.

So my modest proposal is to just have an alias for :=() that is just a run-off-the-mill common name with no special characters. Maybe (ab)using the set() function? It would end up something like this:

dt[, set(col1 = stuff, col2 = moar_stuff)]

I think it not only reads better, but it's also 87% easier to type.

Most helpful comment

Maybe simply use a different word other than set? How about let?

All 15 comments

Relatedly it would also help solve this issue:

https://github.com/Rdatatable/data.table/issues/1543#issuecomment-483134996

(I suggested the name list_set there but the idea is the same)

I think the issue is there's a reason why := is not defined as a proper function (limiting the API so prevent its being used outside [)... not sure how challenging it is to build out the NSE hacks to work for a new function in addition to :=.

on my keyboard I have to press AltGr + } and then space. It's kind of annoying.

Sounds very much so! Are you working in RStudio? There must be a way to define some add-in keyboard shortcuts...

Sounds very much so! Are you working in RStudio? There must be a way to define some add-in keyboard shortcuts...

That was going to be my second proposal. I've just made one and was thinking if it would make sense to add it to data.table proper.

add_dot_equal <- function() {
  rstudioapi::insertText(text = "`:=`()")
  pos <- rstudioapi::getActiveDocumentContext()$selection[[1]]$range
  pos$start[2] <- pos$start[2] - 1
  pos$end[2] <- pos$end[2] - 1
  rstudioapi::setCursorPosition(pos)
}

I've just realised that this would also make it possible to add columns by reference without explicitly adding names. That is, if list_fun() is a function that returns a named list then you could do something like

dt[, set(list_fun(columns))] 

I think it would be cool to stick to the naming convention of starting with set to indicate that is changing something by reference. set_columns()? (too long).

As far as implementation, I have no idea xD

Note that there is alternative API for ":="(c("col1","col2"), list(col1, col2)).

I would prefer to avoid name collision, even if not evaluated. Maybe setj?

As for the named list in :=, isn't it better to just handle ":="(list(col1=col1, col2=col2))?

As for the named list in :=, isn't it better to just handle ":="(list(col1=col1, col2=col2))?

Yes, I guess ultimately it's the same change.

setj() could work, but the "j" there is kind of esoteric and I don't know how many users (specially beginners) appreciate that they are using an argument named "j".

setj() could work, but the "j" there is kind of esoteric and I don't know how many users (specially beginners) appreciate that they are using an argument named "j".

Once they will figure that out, looking at ?"[" for example, they will appreciate. These really basics things that anyone who wants to "know R" should learn at some point.


My suggestion to address this issue is an alias setj which will be substituted for := in [.data.table.

I don't know if many users will ever look at ?"[" (I now I never have 😜️). Still, I generally like more descriptive arguments. "j" is basically meaningless --merely a convention from algebra notation-- whereas "column" actually describes what's going on.
How abut setcol()?

I think most (if not all) set* functions take a data.table as input (e.g., setkey, setorder, setcolorder). The most natural syntax to me would be:

DT[, .(col1 := new_value, col2 := something)]

This way no new symbol is introduced; we simply allow for mixing the already well-known .() (~the result consists of multiple columns) and := (~do the assignment in place) symbols.

In my opinion, that would be more confusing because then the behaviour of .() would not be consistent. Sometimes it would create a new data.table and others it would modify by reference. It could further come into problems if someone where to write DT[, .(col1 := new_value, col2 = something)] and the functional version would still be needed for the ":="(list(col1=col1, col2=col2)) case.

I do think its a good point that set* functions take a data.table as argument, though. :thinking:

setattr works for other types

Yeah, but I think the point is that set functions modify (by reference) the object in the x argument. In this case set would be modifying the data.table in which it is called. It's a small departure from the other functions' behaviour, maybe.

Maybe simply use a different word other than set? How about let?

That's fine by me. Is not bad.

As far as I can tell it's only about changing == ":=" to %in% c(":=", "let") in the code.

I wrote this evil hack for a reddit user that wanted this feature :

set_colon_equal_alias <- function(alias){
  requireNamespace("data.table")
  temp <- data.table:::`[.data.table`
  body(temp)[-1] <- parse(text = gsub(
    '== ":="',
    paste("%in%", deparse(c(":=", alias))) ,
    body(temp)[-1],
    fixed = TRUE))
  assignInNamespace("[.data.table", temp, "data.table")
  invisible()
}

library(data.table)
dt <- data.table(x = c(1,2,3), y = c(4,5,6))
set_colon_equal_alias("let")

dt[, let(double_x = x * 2, x_plus_y = x + y)][]
#>    x y double_x x_plus_y
#> 1: 1 4        2        5
#> 2: 2 5        4        7
#> 3: 3 6        6        9

Might I suggest "mut()"? Seems appropriate for mutating a mutable object :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DavidArenburg picture DavidArenburg  ·  3Comments

mattdowle picture mattdowle  ·  3Comments

alex46015 picture alex46015  ·  3Comments

rafapereirabr picture rafapereirabr  ·  3Comments

arunsrinivasan picture arunsrinivasan  ·  3Comments