Hello,
I have identified a possible bug in using the unnest
function after mutate
.
I thought it would be the bug reported in #483 , but then I realized that they are different. The bug only appears when I run the mutate function before unnest.
With bug: mutate
function beforeunnest
:
library(magrittr)
df_a <- tibble::tribble(~x, ~z,
"a", 10,
"b", 20)
df_b <- df_a %>%
dplyr::filter(!z %in% c(10, 20)) %>%
dplyr::mutate(z = stringr::str_extract_all(x, pattern = "\\d{2}")) %>%
tidyr::unnest(z) %>%
dplyr::select(z,
x)
#> Error in .f(.x[[i]], ...): objeto 'z' n茫o encontrado
Created on 2019-06-22 by the reprex package (v0.3.0)
No bug: mutate function off:
library(magrittr)
df_a <- tibble::tribble(~x, ~z,
"a", 10,
"b", 20)
df_b <- df_a %>%
dplyr::filter(!z %in% c(10, 20)) %>%
# dplyr::mutate(z = stringr::str_extract_all(x, pattern = "\\d{2}")) %>%
tidyr::unnest(z) %>%
dplyr::select(z,
x)
Created on 2019-06-22 by the reprex package (v0.3.0)
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252
[2] LC_CTYPE=Portuguese_Brazil.1252
[3] LC_MONETARY=Portuguese_Brazil.1252
[4] LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils
[5] datasets methods base
other attached packages:
[1] magrittr_1.5
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 tidyr_0.8.3
[3] packrat_0.5.0 crayon_1.3.4
[5] dplyr_0.8.1 assertthat_0.2.1
[7] R6_2.4.0 pillar_1.4.1
[9] stringi_1.4.3 rlang_0.3.4
[11] rstudioapi_0.10 tools_3.6.0
[13] stringr_1.4.0 glue_1.3.1
[15] purrr_0.3.2 compiler_3.6.0
[17] pkgconfig_2.0.2 tidyselect_0.2.5
[19] tibble_2.1.3
I don't think this is something to do with mutate
directly. Using str_extract_all
is changing the type of the column from double vector to list. So you got from an empty z
double vector to a an empty list. It is why you see the issue only after using mutate
.
library(magrittr)
df_a <- tibble::tribble(~x, ~z,
"a", 10,
"b", 20)
df_a %>%
dplyr::filter(!z %in% c(10, 20)) %>%
dplyr::glimpse()
#> Observations: 0
#> Variables: 2
#> $ x <chr>
#> $ z <dbl>
df_a %>%
dplyr::filter(!z %in% c(10, 20)) %>%
dplyr::mutate(z = stringr::str_extract_all(x, pattern = "\\d{2}")) %>%
dplyr::glimpse()
#> Observations: 0
#> Variables: 2
#> $ x <chr>
#> $ z <list> []
Created on 2019-06-23 by the reprex package (v0.3.0.9000)
I think this has to do with #483 as with last version as it seems the issue is still here with column a
not being kept when an empty table
library(tidyr)
df <- tibble::tibble(a = list(1), y = 1)
df %>% .[0, ] %>% unnest(a) %>% str()
#> Classes 'tbl_df', 'tbl' and 'data.frame': 0 obs. of 1 variable:
#> $ y: num
df %>% unnest(a) %>% str()
#> Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of 2 variables:
#> $ a: num 1
#> $ y: num 1
last version of tidyr
sessioninfo::package_info('tidyr')
#> package * version date lib
#> assertthat 0.2.1 2019-03-21 [1]
#> backports 1.1.4 2019-04-10 [1]
#> BH 1.69.0-1 2019-01-07 [1]
#> cli 1.1.0 2019-03-19 [1]
#> crayon 1.3.4 2017-09-16 [1]
#> digest 0.6.19 2019-05-20 [1]
#> dplyr 0.8.1.9000 2019-06-23 [1]
#> ellipsis 0.1.0.9000 2019-06-12 [1]
#> fansi 0.4.0 2018-10-05 [1]
#> glue 1.3.1.9000 2019-05-24 [1]
#> magrittr 1.5.0.9000 2019-01-06 [1]
#> pillar 1.4.1.9000 2019-06-12 [1]
#> pkgconfig 2.0.2 2018-08-16 [1]
#> plogr 0.2.0 2018-03-25 [1]
#> purrr 0.3.2.9000 2019-06-12 [1]
#> R6 2.4.0 2019-02-14 [1]
#> Rcpp 1.0.1.3 2019-05-25 [1]
#> rlang 0.3.99.9003 2019-06-23 [1]
#> stringi 1.4.3 2019-03-12 [1]
#> tibble 2.1.3.9000 2019-06-23 [1]
#> tidyr * 0.8.3.9000 2019-06-13 [1]
#> tidyselect 0.2.5 2018-10-11 [1]
#> utf8 1.1.4 2018-05-24 [1]
#> vctrs 0.1.0.9004 2019-06-23 [1]
#> zeallot 0.1.0 2018-01-28 [1]
#> source
#> CRAN (R 3.5.3)
#> CRAN (R 3.5.3)
#> CRAN (R 3.5.2)
#> CRAN (R 3.5.3)
#> CRAN (R 3.5.3)
#> CRAN (R 3.5.3)
#> Github (tidyverse/dplyr@3471814)
#> Github (r-lib/ellipsis@d8bf8a3)
#> CRAN (R 3.5.3)
#> Github (tidyverse/glue@ea0edcb)
#> Github (tidyverse/magrittr@4104d6b)
#> Github (r-lib/pillar@c017f20)
#> CRAN (R 3.5.3)
#> CRAN (R 3.5.3)
#> Github (tidyverse/purrr@e4d5539)
#> CRAN (R 3.5.3)
#> Github (RcppCore/Rcpp@6062d56)
#> Github (r-lib/rlang@96a69a2)
#> CRAN (R 3.5.3)
#> Github (tidyverse/tibble@abc5390)
#> Github (tidyverse/tidyr@7a2b843)
#> CRAN (R 3.5.3)
#> CRAN (R 3.5.3)
#> Github (r-lib/vctrs@0abd575)
#> CRAN (R 3.5.3)
#>
#> [1] C:/Users/chris/Documents/R-dev
#> [2] C:/Users/chris/Documents/R/win-library/3.5
#> [3] C:/Program Files/R/R-3.5.3/library
Created on 2019-06-23 by the reprex package (v0.3.0.9000)
Even more minimal reprex:
library(tidyr)
df <- tibble(x = list(), y = integer())
df %>% unnest(y) %>% names()
#> [1] "x"
Created on 2019-07-24 by the reprex package (v0.3.0)
The problem arises because unnest()
does three things:
unchop()
each columnunpacks()
each columnSo
df1 <- tibble(x = list(1), y = 1)
df1 %>% unnest(x)
#> # A tibble: 1 x 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
is equivalent to
df2 <- tibble(x = data.frame(x = 1), y = 1)
df2 %>% unpack(x)
#> # A tibble: 1 x 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
The problem is that when x
is empty it doesn't get "data frame-d", so when unpacked()
it disappears. I think that means the fix is for unnest()
to handle empty columns specially. It think it'll have to create a column of vctrs::unspecified()
type, since that can be coerced to anything else.
Oops, that's not quite right - it needs to be a length-0 list_of(.ptype = tibble(x = unspecified())
Even more minimal reprex:
library(tidyr) df <- tibble(x = list(), y = integer()) df %>% unnest(y) %>% names() #> [1] "x"
Apologies for jumping in in a closed issue, but I'm still having trouble with this issue at tidyr==1.0.0
.
A recursive dose of unnest_wider
is the core workhorse for a package I'm working on, but it trips over list columns with gaps in the first record, and it appears that the useful looking keep_empty
is not supported any more.
Is there any other fix than supplying .ptype
for each column?
Update, the current master of tidyr seems to have fixed my problem (but not the example here), I need to investigate more.
```{r}
library(tidyr)
df <- tibble(x = list(), y = integer())
df %>% unnest(y) %>% names()
<details>
### SessionInfo
```{r}
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 19.10
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.3.9000 readr_1.3.1 tidyr_1.0.0 tibble_2.1.3
[8] ggplot2_3.2.1 tidyverse_1.3.0 ruODK_0.6.6.9005
loaded via a namespace (and not attached):
[1] Rcpp_1.0.3 cellranger_1.1.0 pillar_1.4.2 compiler_3.6.1 dbplyr_1.4.2 tools_3.6.1
[7] packrat_0.5.0 lubridate_1.7.4 jsonlite_1.6 lifecycle_0.1.0.9000 nlme_3.1-141 gtable_0.3.0
[13] lattice_0.20-38 pkgconfig_2.0.3 rlang_0.4.2.9000 reprex_0.3.0 cli_1.1.0 DBI_1.0.0
[19] rstudioapi_0.10 haven_2.2.0 withr_2.1.2 xml2_1.2.2 httr_1.4.1 hms_0.5.2
[25] generics_0.0.2 fs_1.3.1 vctrs_0.2.0.9007 grid_3.6.1 tidyselect_0.2.5 glue_1.3.1
[31] R6_2.4.1 fansi_0.4.0 readxl_1.3.1 modelr_0.1.5 magrittr_1.5 backports_1.1.5
[37] scales_1.1.0 usethis_1.5.1.9000 rvest_0.3.5 assertthat_0.2.1 colorspace_1.4-1 utf8_1.1.4
[43] stringi_1.4.3 lazyeval_0.2.2 munsell_0.5.0 broom_0.5.2 crayon_1.3.4
Distributor ID: Ubuntu
Description: Ubuntu 19.10
Release: 19.10
Codename: eoan
@florianm please open a new issue with reprex created the reprex package. No need to include session info unless it is explicitly asked for.