nafill and setnafill only supports numeric types.
library(data.table)
x = letters[1:10]
x[c(1:2, 5:6, 9:10)] = NA
nafill(x, "locf")
# Error in nafill(x, "locf") :
# 'x' argument must be numeric type, or list/data.table of numeric types
setnafill(x, "locf")
# Error in setnafill(x, "locf") :
# 'x' argument is atomic vector, in-place update is supported only for list/data.table
Both zoo::na.locf and tidyr::fill support all data types.
zoo::na.locf(x, na.rm = FALSE)
# [1] NA NA "c" "d" "d" "d" "g" "h" "h" "h"
tidyr::fill(tibble::as_tibble(x), value)
# A tibble: 10 x 1
# value
# <chr>
# 1 NA
# 2 NA
# 3 c
# 4 d
# 5 d
# 6 d
# 7 g
# 8 h
# 9 h
# 10 h
Feature request for supporting character, factor and other types. It's useful in many cases in merging and cleaning data.tables/data.frames. Unlike matrix, data.frames/data.tables usually contain columns of arbitrary types, and it's very common for data.tables/data.frames to have NA values filled in columns of arbitrary types, especially characters. Right now I have to import the specific functions from other packages just for that functionalities and those functions are much slower than the ones created by data.table.
Thank you for creating this superior data.table package.
Maybe as a temp workaround:
DT <- data.table(X=c("A",NA,NA,"B",NA,"C"))
DT[, X := X[nafill(replace(.I, is.na(X), NA), "locf")]]
and using mik3y64 example
x = letters[1:10]
x[c(1:2, 5:6, 9:10)] = NA
x[nafill(replace(seq_along(x), is.na(x), NA_integer_), "locf")]
Thanks @mik3y64 for the request. It is on the list to do, it was just not the part of the initial functionality. We have to first finish https://github.com/Rdatatable/data.table/pull/3765 then the logic here can re-use changes from that PR.
Thanks @chinsoon12 for interesting workaround.
Great!! I am enjoying the performance of data.table. Totally understand the hard work in writing fast codes, unlike many other packages out there where they're mostly just some sort of "wrappers". Thank you very much data.table team for the awesome works. Thanks @chinsoon12 for the very interesting workaround. I would've never thought of doing it that way.
I agree that character support would be useful. Right now here is a workaround:
library(data.table)
DT=data.table(x.chr=c("foo", NA, "bar"))
DT[, x.fac := factor(x.chr)]
DT[, x.int := as.integer(x.fac)]
setnafill(DT, type="locf", cols="x.int")
levs <- levels(DT$x.fac)
DT[, x.fac2 := factor(x.int, seq_along(levs), levs)]
DT[, x.chr2 := paste(x.fac2)]
Extending @tdhock's approach to many character columns:
char_cols = c(...)
DT[ , (char_cols) := lapply(.SD, factor), .SDcols = char_cols]
lev = sapply(char_cols, function(x) levels(DT[[x]]))
DT[ , (char_cols) := lapply(.SD, as.integer), .SDcols = char_cols]
DT[ , (char_cols) := lapply(.SD, nafill, 'locf'), /*by = ...,*/ .SDcols = char_cols]
for (col in char_cols) set(DT, NULL, col, lev[[col]][DT[[col]]])
@jangorecki we don't need #4491 to do nafill for factor right? Just need copy the attributes?
That would take out a lot of the work of the above...
yes, #4491 should handle factor levels.
I think handling character in nafill could be just coerce to factor in R, and then coerce back to character after processing. This way we don't have to make extra escape branches for parallel processing, which cannot be applied to character columns.
just ran into this on logical type using 1.13.1:
nafill(c(T,NA), type="locf")
would be great if this could be handled as well
Most helpful comment
Maybe as a temp workaround:
and using mik3y64 example