Tidyr: unnest multiple columns

Created on 15 Sep 2019  路  5Comments  路  Source: tidyverse/tidyr

See #737 for full example

Given the overall extreme flexibility of the new functions I was surprised that unnest_wider doesn't allow unnesting several columns.

library(tidyverse)
df2 <- tibble::tribble(
  ~id, ~var1,              ~Country,                        ~Sport,             ~Format,
  10L,  169L, c("Norway", "Sweden"),                        "Skii", c("Video", "Photo"),
  11L,  150L,               "Spain", c("Bike", "Soccer", "Basket"),             "Photo",
  12L,    0L,                 "USA",                    "Baseball",             "Video"
)

df2 %>%
  unnest_wider(Country, names_sep = "_") %>%
  unnest_wider(Sport, names_sep = "_") %>%
  unnest_wider(Format, names_sep = "_")
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> New names:
#> * `` -> ...1
#> New names:
#> * `` -> ...1
#> New names:
#> * `` -> ...1
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> New names:
#> * `` -> ...1
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> New names:
#> * `` -> ...1
#> New names:
#> * `` -> ...1
#> # A tibble: 3 x 9
#>      id  var1 Country_...1 Country_...2 Sport_...1 Sport_...2 Sport_...3
#>   <int> <int> <chr>        <chr>        <chr>      <chr>      <chr>     
#> 1    10   169 Norway       Sweden       Skii       <NA>       <NA>      
#> 2    11   150 Spain        <NA>         Bike       Soccer     Basket    
#> 3    12     0 USA          <NA>         Baseball   <NA>       <NA>      
#> # ... with 2 more variables: Format_...1 <chr>, Format_...2 <chr>

If I didn't want to enumerate explicitly I could design a complex reduce call but it would be neat to be able to just call :

df2 %>%
  unnest_wider(one_of("Country","Sport", "Format"), names_sep = "_")

In case this isn't acceptable, I think we could use a more helpful message than

Error in .subset2(x, i) : no such index at level 2

feature rectangling

Most helpful comment

Hi @hadley,

Just a note that (unlike unnest_wider()) the desired behavior of unnest_longer(c(x, y)) can't be produced by multiple applications of unnest_longer(), because we get a grid expansion.

library(tidyr)

df <- tibble(x = list(1, 1:2), y = list(1, 1:2))
df %>%
  unnest_longer(x) %>% 
  unnest_longer(y)
#> # A tibble: 5 x 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     1     1
#> 3     1     2
#> 4     2     1
#> 5     2     2

Created on 2020-03-13 by the reprex package (v0.3.0)

I think this would be the desired behavior:

df %>%
  unnest_longer(c(x, y))
#> # A tibble: 5 x 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     1     1
#> 3     2     2

All 5 comments

Minimal reprex:

library(tidyr)

df <- tibble(x = list(1, 1:2), y = list(1, 1:2))
df %>%
  unnest_wider(x, names_sep = "_") %>% 
  unnest_wider(y, names_sep = "_")
#> # A tibble: 2 x 4
#>     x_1   x_2   y_1   y_2
#>   <dbl> <int> <dbl> <int>
#> 1     1    NA     1    NA
#> 2     1     2     1     2

# Why can't we just write:
df %>%
  unnest_wider(c(x, y), names_sep = "_") 
#> Error in `[[<-.data.frame`(`*tmp*`, col, value = list()): replacement has 0 rows, data has 2

Created on 2019-11-24 by the reprex package (v0.3.0)

@moodymudskipper it would be very helpful if before filing an issue you spent some time condensing your code down to the bare minimum. It's great that you are providing reprexes, but you really need to work on the minimal part.

I've improved the error message, and I'll think about allowing unnest_longer() and unnest_wider() to work with multiple columns at one time.

Thank @Hadley I will do better next time

Hi @hadley,

Just a note that (unlike unnest_wider()) the desired behavior of unnest_longer(c(x, y)) can't be produced by multiple applications of unnest_longer(), because we get a grid expansion.

library(tidyr)

df <- tibble(x = list(1, 1:2), y = list(1, 1:2))
df %>%
  unnest_longer(x) %>% 
  unnest_longer(y)
#> # A tibble: 5 x 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     1     1
#> 3     1     2
#> 4     2     1
#> 5     2     2

Created on 2020-03-13 by the reprex package (v0.3.0)

I think this would be the desired behavior:

df %>%
  unnest_longer(c(x, y))
#> # A tibble: 5 x 2
#>       x     y
#>   <dbl> <dbl>
#> 1     1     1
#> 2     1     1
#> 3     2     2
Was this page helpful?
0 / 5 - 0 ratings