I believe it is a reasonable change that := no longer recycles length>1 RHS vectors (#3310).
The following cases are some usual usage that may be broken:
For example, suppose we have a data.table of quarterly prices in each year of each symbol. Now create a column of price of the fourth quarter of each year:
dt[, last_quarter_price := price[quarter == 4L], by = .(symbol, year)]
For those stocks which delisted for some reason, there might be no data for the fourth quarter of the last year so that price[quarter == 4L] may result in a zero-length numeric vector. With the newer (restrict) recycling behavior, this will end up in an error.
To handle it, I have to change the code into the following
dt[, last_quarter_price := price[quarter == 4L][1L], by = .(symbol, year)]
A similar use case is as follows:
dt[, first_price := first(price[volume > 0]), by = symbol]
For some reason, price[volume > 0] may be a zero-length numeric vector, and first(<zero-length vector>) also returns a zero-length vector. In this case, first_price should get an NA, so I have to change the code into the following to achieve this:
dt[, first_price := price[volume > 0][1L], by = symbol]
In both cases, the length of the resulted vector must be zero or one. One is consistently recycled like before but the zero-length cases are broken. I'm not sure if it makes sense that := <zero-length vector> automatically gets a missing value, or otherwise, I need to rework all such cases so that they get NA like before.
there is no reproducible example so I cannot ultimately test this but it might have been resolved by https://github.com/Rdatatable/data.table/pull/3393 @renkun-ken please confirm and close
@jangorecki It's quite easy to make some reproducible examples.
library(data.table)
quarterly_prices_csv <- "
symbol,year,quarter,price
A1,2017,1,10.0
A1,2017,2,11.0
A1,2017,3,12.0
A1,2017,4,11.0
A1,2018,1,12.0
A1,2018,2,13.0
A2,2017,1,10.0
A2,2017,2,11.0
A2,2017,3,12.0
A2,2017,4,11.0
A2,2018,1,12.0
"
quarterly_prices_dt <- fread(quarterly_prices_csv)
quarterly_prices_dt[, fourth_quarter_price := price[quarter == 4L], by = .(symbol, year)]
#> Error in `[.data.table`(quarterly_prices_dt, , `:=`(fourth_quarter_price, : Supplied 0 items to be assigned to group 2 of size 2 in column 'fourth_quarter_price'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
# walk-around
quarterly_prices_dt[, fourth_quarter_price := price[quarter == 4L][1L], by = .(symbol, year)]
quarterly_prices_dt
#> symbol year quarter price fourth_quarter_price
#> 1: A1 2017 1 10 11
#> 2: A1 2017 2 11 11
#> 3: A1 2017 3 12 11
#> 4: A1 2017 4 11 11
#> 5: A1 2018 1 12 NA
#> 6: A1 2018 2 13 NA
#> 7: A2 2017 1 10 11
#> 8: A2 2017 2 11 11
#> 9: A2 2017 3 12 11
#> 10: A2 2017 4 11 11
#> 11: A2 2018 1 12 NA
price_volume_csv <- "
symbol,date,price,volume
A1,20180102,10.0,0
A1,20180103,10.0,0
A1,20180104,11.0,100
A2,20180102,5.0,0
A2,20180103,5.0,0
A2,20180104,5.0,0
"
price_volume_dt <- fread(price_volume_csv)
price_volume_dt[, first_price := first(price[volume > 0]), by = symbol]
#> Error in `[.data.table`(price_volume_dt, , `:=`(first_price, first(price[volume > : Supplied 0 items to be assigned to group 2 of size 3 in column 'first_price'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
# walk-around
price_volume_dt[, first_price := price[volume > 0][1L], by = symbol]
price_volume_dt
#> symbol date price volume first_price
#> 1: A1 20180102 10 0 11
#> 2: A1 20180103 10 0 11
#> 3: A1 20180104 11 100 11
#> 4: A2 20180102 5 0 NA
#> 5: A2 20180103 5 0 NA
#> 6: A2 20180104 5 0 NA
Thanks @renkun-ken! Now fixed in dev and your tests added verbatim.
Thanks, @mattdowle!
Most helpful comment
@jangorecki It's quite easy to make some reproducible examples.