Tidyr: Unnesting list-columns gave error even they all have equal lengths

Created on 5 Jun 2018  路  3Comments  路  Source: tidyverse/tidyr

Here is an example that reproduces the error.

> df <- tibble(x = c("a", "b", "c"), y = list(a = 1:3, b = 4:6, c = 7:9), z = list(1:3))
> df
# A tibble: 3 x 3
  x     y         z        
  <chr> <list>    <list>   
1 a     <int [3]> <int [3]>
2 b     <int [3]> <int [3]>
3 c     <int [3]> <int [3]>
> unnest(df)
Error: All nested columns must have the same number of elements.

However, I note that slice the whole data frame and then unnest works

> df %>% slice(1:n()) %>% unnest()
# A tibble: 9 x 3
  x         y     z
  <chr> <int> <int>
1 a         1     1
2 a         2     2
3 a         3     3
4 b         4     1
5 b         5     2
6 b         6     3
7 c         7     1
8 c         8     2
9 c         9     3
bug rectangling tidy-dev-day

Most helpful comment

Hi @yuanwxu, just another community member here.
Here's a slightly more minimal reprex of what I think the issue is:

Issue

library(tidyverse)
df1 <- tibble(x = c("a"), y = list(1:3), z = list(4:6))
df1 %>% unnest()
#> # A tibble: 3 x 3
#>   x         y     z
#>   <chr> <int> <int>
#> 1 a         1     4
#> 2 a         2     5
#> 3 a         3     6

# add name to list in y causes issue
df2 <- tibble(x = c("a"), y = list(foo = 1:3), z = list(4:6))
df2 %>% unnest()
#> Error: All nested columns must have the same number of elements.

Workaround

Another possible workaround instead of slicing could be to unname the list columns first

df2 %>% mutate_if(is.list, unname) %>% unnest()
#> # A tibble: 3 x 3
#>   x         y     z
#>   <chr> <int> <int>
#> 1 a         1     4
#> 2 a         2     5
#> 3 a         3     6

Why is this happening?
Looking at the code, I think this behaviour comes from

https://github.com/tidyverse/tidyr/blob/cbdd14e90b2a771e242a44d1ed5eea84d53da642/R/unnest.R#L96-L100

If we adapt that for our example we can see why the error message occurs:

# lengths considered the same
n1 <- df1 %>% select(-x) %>% map(~ map_int(., NROW))
n1
#> $y
#> [1] 3
#> 
#> $z
#> [1] 3
length(unique(n1))
#> [1] 1

# lengths considered different
n2 <- df2 %>% select(-x) %>% map(~ map_int(., NROW))
n2
#> $y
#> foo 
#>   3 
#> 
#> $z
#> [1] 3
length(unique(n2))
#> [1] 2

All 3 comments

Hi @yuanwxu, just another community member here.
Here's a slightly more minimal reprex of what I think the issue is:

Issue

library(tidyverse)
df1 <- tibble(x = c("a"), y = list(1:3), z = list(4:6))
df1 %>% unnest()
#> # A tibble: 3 x 3
#>   x         y     z
#>   <chr> <int> <int>
#> 1 a         1     4
#> 2 a         2     5
#> 3 a         3     6

# add name to list in y causes issue
df2 <- tibble(x = c("a"), y = list(foo = 1:3), z = list(4:6))
df2 %>% unnest()
#> Error: All nested columns must have the same number of elements.

Workaround

Another possible workaround instead of slicing could be to unname the list columns first

df2 %>% mutate_if(is.list, unname) %>% unnest()
#> # A tibble: 3 x 3
#>   x         y     z
#>   <chr> <int> <int>
#> 1 a         1     4
#> 2 a         2     5
#> 3 a         3     6

Why is this happening?
Looking at the code, I think this behaviour comes from

https://github.com/tidyverse/tidyr/blob/cbdd14e90b2a771e242a44d1ed5eea84d53da642/R/unnest.R#L96-L100

If we adapt that for our example we can see why the error message occurs:

# lengths considered the same
n1 <- df1 %>% select(-x) %>% map(~ map_int(., NROW))
n1
#> $y
#> [1] 3
#> 
#> $z
#> [1] 3
length(unique(n1))
#> [1] 1

# lengths considered different
n2 <- df2 %>% select(-x) %>% map(~ map_int(., NROW))
n2
#> $y
#> foo 
#>   3 
#> 
#> $z
#> [1] 3
length(unique(n2))
#> [1] 2

Looks like we need to strip the names here.

I will work on this for dev day.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

strengejacke picture strengejacke  路  8Comments

andrewpbray picture andrewpbray  路  8Comments

thays42 picture thays42  路  3Comments

kendonB picture kendonB  路  5Comments

albertotb picture albertotb  路  7Comments