Dplyr: bug with nanotime format

Created on 22 May 2019  路  6Comments  路  Source: tidyverse/dplyr

Hello there!
Consider this wonderful reprex:

library(tibble)
#> Warning: package 'tibble' was built under R version 3.4.4
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.4.4
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(nanotime)
#> Warning: package 'nanotime' was built under R version 3.4.4

df <- tibble(mytimestamp =  c(nanotime('2011-12-05 08:30:00.000',format ="%Y-%m-%d %H:%M:%E9S",  tz ="GMT"),
                        nanotime('2011-12-05 08:30:00.100',format ="%Y-%m-%d %H:%M:%E9S",  tz ="GMT"),
                        nanotime('2011-12-05 08:30:00.825',format ="%Y-%m-%d %H:%M:%E9S",  tz ="GMT")),
var = c(1,1,2)) 

df
#> # A tibble: 3 x 2
#>   mytimestamp                           var
#>   <nanotime>                          <dbl>
#> 1 2011-12-05T08:30:00.000000000+00:00     1
#> 2 2011-12-05T08:30:00.100000000+00:00     1
#> 3 2011-12-05T08:30:00.825000000+00:00     2

df %>% group_by(var) %>% summarize_all(., last)
#> # A tibble: 2 x 2
#>     var mytimestamp        
#>   <dbl> <integr64>         
#> 1     1 1323073800100000000
#> 2     2 1323073800825000000

Created on 2019-05-21 by the reprex package (v0.2.1)

As you can see, doing a simple aggregation on a nanotime column will strip the column from its date formatting.

Is this related to how tibble manages S4 classes? Is this a bug?

Thanks!!

All 6 comments

I realize this might be related to dplyr instead. Posting there. Apologies for cross-posting if this is not relevant for tibble as well.

Thanks!

@romainfrancois this is just incredible. As I as about to post on dplyr, you transferred the post here. Just amazing coincidence

@romainfrancois interestingly, different types of aggregation will lead to different outputs. Please let me know if I can help/test in any way as the nanotime format if vital for many data processing tasks where the timestamp has to be extremely accurate. I do not want to move my workflow to data.table

Thanks!!!

> df %>% mutate_all(., max)
# A tibble: 3 x 2
  mytimestamp                           var
  <S4: nanotime>                      <dbl>
1 2011-12-05T08:30:00.825000000+00:00     2
2 2011-12-05T08:30:00.825000000+00:00     2
3 2011-12-05T08:30:00.825000000+00:00     2

> df %>% mutate_all(., mean)
# A tibble: 3 x 2
  mytimestamp                      var
  <S3: integer64>                <dbl>
1 1323073800308333333   1.333333333333
2 1323073800308333333   1.333333333333
3 1323073800308333333   1.333333333333

The problem is that there is no [[ for nanotime objects:

library(nanotime)

times <- c(nanotime('2011-12-05 08:30:00.000',format ="%Y-%m-%d %H:%M:%E9S",  tz ="GMT"),
  nanotime('2011-12-05 08:30:00.100',format ="%Y-%m-%d %H:%M:%E9S",  tz ="GMT"),
  nanotime('2011-12-05 08:30:00.825',format ="%Y-%m-%d %H:%M:%E9S",  tz ="GMT"))

times[1]
#> [1] "2011-12-05T08:30:00.000000000+00:00"
times[[1]]
#> integer64
#> [1] 1323073800000000000

I filed an issue in the nanotime repo: https://github.com/eddelbuettel/nanotime/issues/44

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

Was this page helpful?
0 / 5 - 0 ratings