Data.table: setDT should check if any column is POSIXlt

Created on 4 Nov 2020  Â·  8Comments  Â·  Source: Rdatatable/data.table

I was doing setDT() of a dataframe (or tibble) with a variable POSIXlt. It seems not to be any problem. Even I can do str() of that object. It does not show anything strange. Then errors started to appear when I tried to manipulate the object or simply do head().

As I have found the problem, I would ask/suggest why do not produce the error when running setDT or, otherwise show a warning?, which would tell about the problem that there are POSIXlt variables and they are not supported. Possibly these variables could be removed or coerced to POSIXct or another better format for data.table and the warning would tell about that.

Update: Minimal example with session info:

> library(tibble)
> library(data.table)
> now <- as.POSIXlt(Sys.time())
> x <- as.data.frame(tibble(now))
> mdt <- data.table(id=1:3, d=strptime(c("06:02:36", "06:02:48", "07:03:12"), "%H:%M:%S"))
Warning message:
In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,  :
  POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.
> # previous is fine; but next...
> setDT(x)
> head(x)
Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent
> str(x)
Classes ‘data.table’ and 'data.frame':  11 obs. of  1 variable:
 $ now: POSIXlt, format: "2020-11-04 10:24:47"
 - attr(*, ".internal.selfref")=<externalptr> 
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.0 tibble_3.0.3     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5          rstudioapi_0.11     magrittr_1.5.0.9000 tidyselect_1.1.0    munsell_0.5.0      
 [6] colorspace_1.4-1    R6_2.4.1            rlang_0.4.7         fansi_0.4.1         plyr_1.8.6         
[11] dplyr_1.0.2         tools_4.0.3         grid_4.0.3          gtable_0.3.0        cli_2.0.2          
[16] ellipsis_0.3.1      assertthat_0.2.1    lifecycle_0.2.0     crayon_1.3.4        purrr_0.3.4        
[21] ggplot2_3.3.2       vctrs_0.3.4         glue_1.4.2          compiler_4.0.3      pillar_1.4.6       
[26] generics_0.0.2      scales_1.1.1        pkgconfig_2.0.3    

Any case, thank you for this great package.

Most helpful comment

@jangorecki The example is not suitable as the warning is thrown from data.table(). However, I did some test (and edited your example) and realized that your conclusion is correct, the POSIXlt is indeed special - it's a named list - so it's supposed to not work with data.table other operations well.

@iago-pssjd you're right, POSIXlt is indeed different and we should take action in setDT().

All 8 comments

@iago-pssjd Thank you for your report. What would be even more better is to include minimal example (so we can just copy copy to observe issue you are describing) and your session info (to check if you might be using older version of DT and issue might have been fixed already).

Thank you @jangorecki . I updated the issue and removed the reference to data.table() which i toke just after look at the SO question, but I didn't check it. Now I see that this function does what I asked for, but setDT yet not.

Thank you.
We can define an action point for this issue to add check for POSIXlt inside setDT.

It looks like a problem of print.data.table() to me and not setDT(). print.data.table() should work with a data.table object that contains POSIXlt vectors.

In my opinion, generally, we should not change the vector value of the object by setDT(), unless there're very strong reasons.

@shrektan is not only a problem of print.data.table(), since if you want to do other operations on the object, it gives an error too (let me remember which specific operations, but I believe I was trying to create some variable, and I believe it had nothing to do with the time variable, bot sure also if using by =...)

I can't understand this because I think POSIXlt is nothing special and should not affect other usage. So, an example is welcomed.

If we change POSIXlt to POSIXct it won't be by reference anymore so I think the best is to raise error. Other operations will also warn on POSIXlt input.

library(data.table)
x <- data.table(g=Sys.time(), a=1)
x[["g"]] <- as.POSIXlt(Sys.time())
str(x)
#> Classes 'data.table' and 'data.frame':   11 obs. of  2 variables:
#>  $ g: POSIXlt, format: "2020-11-12 02:01:56"
#>  $ a: num 1
#>  - attr(*, ".internal.selfref")=<externalptr>
x[, sum(a), g]
#> Error in `[.data.table`(x, , sum(a), g): column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

@jangorecki The example is not suitable as the warning is thrown from data.table(). However, I did some test (and edited your example) and realized that your conclusion is correct, the POSIXlt is indeed special - it's a named list - so it's supposed to not work with data.table other operations well.

@iago-pssjd you're right, POSIXlt is indeed different and we should take action in setDT().

Was this page helpful?
0 / 5 - 0 ratings