Can new pivot_longer()
and pivot_wider()
be S3 generics so that we can extend them to other tibble-like objects? Thanks.
I'd prefer to not make them generic for at least one version. And you should be able to get a lot of genericity via vctrs.
I'm confused: via vctrs one can't set the (sub)class of the object returned by pivot_*
, right?
@edzer right, but (in principle) you shouldn't need to for subclasses of data.frame/tibble as vctrs should take care of the details (it's totally possible it doesn't right now, but I'd prefer to attack it upstream before adding extension points in vctrs)
See https://github.com/r-spatial/sf/issues/1149 : vctrs
seems to work only on the columns, not on the object as a whole. It sounds like you take the standpoint of "the world doesn't need anything else than tibbles, everything special should be done inside columns", but I think both time series (tsibble) and spatial data (sf) have shown that that is not enough.
I don't see the whole picture yet, but at least tidyr seems not ready to accept other than tibble
and data.frame
. For example, pivot_longer_spec()
probably needs either .ptype
specified on vec_cbind()
:
https://github.com/tidyverse/tidyr/blob/4618116a9296e0f11337c416966ea0f5a62f3732/R/pivot-long.R#L187-L192
or vec_restore()
instead of this tibble-specific function:
https://github.com/tidyverse/tidyr/blob/4618116a9296e0f11337c416966ea0f5a62f3732/R/pivot-long.R#L194
Yet, it's still unclear to me if it's really possible to restore the result only by defining vctrs functions because, in tidyr's case, the shape of the input and that of output are different.
It's also not clear to me that how we can just define vctrs::vec_restore()
for our tibble-like objects to achieve different results as expected. For example, this is how tsibble currently handles gather()
and spread()
: gather()
adds the new melted column to the key specification, and spread()
removes the column from the key. Other dplyr
methods also conditionally validate if the result is a valid tsibble.
library(tidyr)
# reshaping examples from tidyr
stocks <- tsibble::tsibble(
time = as.Date("2009-01-01") + 0:9,
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2),
Z = rnorm(10, 0, 4)
)
#> Using `time` as index variable.
(stocksm <- stocks %>% gather(stock, price, -time))
#> # A tsibble: 30 x 3 [1D]
#> # Key: stock [3]
#> time stock price
#> <date> <chr> <dbl>
#> 1 2009-01-01 X 0.747
#> 2 2009-01-02 X 1.90
#> 3 2009-01-03 X -2.06
#> 4 2009-01-04 X 0.0645
#> 5 2009-01-05 X -0.265
#> 6 2009-01-06 X -0.447
#> 7 2009-01-07 X -1.41
#> 8 2009-01-08 X -0.506
#> 9 2009-01-09 X -0.270
#> 10 2009-01-10 X -1.09
#> # … with 20 more rows
stocksm %>% spread(stock, price)
#> # A tsibble: 10 x 4 [1D]
#> time X Y Z
#> <date> <dbl> <dbl> <dbl>
#> 1 2009-01-01 0.747 0.724 2.82
#> 2 2009-01-02 1.90 -0.671 3.97
#> 3 2009-01-03 -2.06 2.73 4.58
#> 4 2009-01-04 0.0645 -1.42 -4.96
#> 5 2009-01-05 -0.265 1.32 10.6
#> 6 2009-01-06 -0.447 0.582 -0.628
#> 7 2009-01-07 -1.41 0.396 -1.69
#> 8 2009-01-08 -0.506 -2.41 -0.794
#> 9 2009-01-09 -0.270 -0.0796 -3.58
#> 10 2009-01-10 -1.09 1.37 3.62
The vctrs largely simplifies the developer's work (and thanks for that), and I've already migrated all tsibble custom vector classes into vctrs (https://github.com/tidyverts/tsibble/tree/dev-v0.9.0). But S3 generics are useful for allowing downstream packages to gain finer control on the input and output, if not possible with vec_restore()
.
@edzer no, that's definitely not the case — I'm saying we haven't figured out what extension will look like for container type objects (https://github.com/r-lib/vctrs/issues/211), and I don't want to expose something half-baked so that you have to change it multiple times. We'll make this a priority for the next release of vctrs.
As well as the vctrs issue, I'm generally uncertain whether it's a good idea to make the first version of a function a generic, even for functions where we've thought about them alot. I'm concerned that in the next couple of months someone will point our a use case that I hadn't considered which will necessitate a change to the interface. Again, I don't want to expose something half-baked that will require downstream maintainers to do a lot of work.
Thanks @hadley, good to hear we're on the same page here; I agree it is annoying for everyone to introduce generics that then later have to change interface.
For reference, a new issue seems filed here: #800
Most helpful comment
For reference, a new issue seems filed here: #800