pivot_wider
is missing the drop
option that was present in spread
which is quite useful
to fill a matrix with empty rows.
df <- tibble(x=factor(c(1,1,2), 1:3), y=c("a", "b", "b"), z=11:13)
print(df)
#> # A tibble: 3 x 3
#> x y z
#> <fct> <chr> <int>
#> 1 1 a 11
#> 2 1 b 12
#> 3 2 b 13
pivot_wider(df, names_from = y, values_from=z)
#> # A tibble: 2 x 3
#> x a b
#> <fct> <int> <int>
#> 1 1 11 12
#> 2 2 NA 13
spread(df, key=y, value=z, drop=FALSE)
#> # A tibble: 3 x 3
#> x a b
#> <fct> <int> <int>
#> 1 1 11 12
#> 2 2 NA 13
#> 3 3 NA NA
Needs r-lib/vctrs#686.
Not sure what to call this argument; it could be names_drop = FALSE
but drop
is not terribly descriptive. It's more like names_use_factor_levels = TRUE
but that's rather verbose.
What about names_keep
or names_keep_all
?
Note that the spread()
function with drop = FALSE
also kept factor levels that were not present in the data, creating columns for them (this is not shown in the above example). This was a very useful feature, which is not present in pivot_wider()
. Here is a regexp for this:
library(tidyr)
d = tibble(day_int = c(4,5,1,2),
day_fac = factor(day_int, levels=1:5,
labels=c("Mon","Tue", "Wed","Thu","Fri")))
d
#> # A tibble: 4 x 2
#> day_int day_fac
#> <dbl> <fct>
#> 1 4 Thu
#> 2 5 Fri
#> 3 1 Mon
#> 4 2 Tue
levels(d$day_fac)
#> [1] "Mon" "Tue" "Wed" "Thu" "Fri"
# spread() automatically creates a `Wed` column
spread(d, key = "day_fac", value = "day_int", drop = FALSE)
#> # A tibble: 1 x 5
#> Mon Tue Wed Thu Fri
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 NA 4 5
# ... but pivot_wider() does not (and does not respect
# the level ordering, but that is issue #839)
pivot_wider(d, names_from = day_fac, values_from = day_int,
values_fill = list(day_fac = NA))
#> # A tibble: 1 x 4
#> Thu Fri Mon Tue
#> <dbl> <dbl> <dbl> <dbl>
#> 1 4 5 1 2
For naming the argument, I like the consistency of names_fill = TRUE
(cf. values_fill
).
Thinking about this more, I think it's unlikely the full spectrum of needs can be resolved with just a couple of new arguments to pivot_wider()
. I think the right approach is probably to document the spec data frame in much more detail, and show how you can use expand_grid()
and friends to generate exactly the output that you want.
Hi,
I think it's a shame that spread/pivot_wider can't be told what columns to expect when given an empty input. In extensive pipes, empty data frames/tibbles do happen and missing columns can break the whole mechanism e.g. mutate after pivot_wider. I'd be happier with a result albeit an empty data frames than a broken pipe.
thanks,
Stefan
@smarc it can be. That's the purpose of the spec.
Here鈥檚 another use case, a bit different from my earlier example, but similar to data I work with. Here, each level of a factor (gender
) is present, but not always for each value/label of the other names_from
variable.
library(tidyverse)
# Example data
# Note: In 2019, only males responded
d = tibble(
year = c(2018, 2018, 2019, 2020, 2020),
gender = factor(c("female", "male", "male", "female", "male")),
percentage = seq(30, 70, 10)
)
pivot_wider(d, names_from = c(year, gender), values_from = percentage)
#> # A tibble: 1 x 5
#> `2018_female` `2018_male` `2019_male` `2020_female` `2020_male`
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 30 40 50 60 70
Expected syntax and results:
pivot_wider(d, names_from = c(year, gender), values_from = percentage,
names_fill = TRUE)
#> # A tibble: 1 x 6
#> `2018_female` `2018_male` `2019_female` `2019_male` `2020_female` `2020_male`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 30 40 NA 50 60 70
(I am aware that the complete()
function can be used to preprocess the data to achieve the above outcome.)
names_fill = TRUE would be a useful feature!
Most helpful comment
Note that the
spread()
function withdrop = FALSE
also kept factor levels that were not present in the data, creating columns for them (this is not shown in the above example). This was a very useful feature, which is not present inpivot_wider()
. Here is a regexp for this:For naming the argument, I like the consistency of
names_fill = TRUE
(cf.values_fill
).