Hi
I'm using dplyr within a package I'm writing. As a convenience I made a class for certain types of data (VCF, with a series of tests checking the format is correct)... and appended it to the tbl class names .. similarly as dplyr does to data.frames (I guess).
PD4107a
Source: local data frame [10,176 x 7]chr start.position end.position strand WT MUT sampleID
1 1 1857336 1857336 + G A PD4107a
2 1 2329409 2329409 + A G PD4107a
3 1 2620133 2620133 + C G PD4107a
.. ... ... ... ... .. ... ...class(PD4107a)
[1] "tbl_df" "tbl" "data.frame" "VCF"
However I note that any dplyr verb will remove my "VCF"
filter(PD4107a, chr==1) %>% class
[1] "tbl_df" "tbl" "data.frame"
I can obviously go through every function and correct the return tbl at the end... and have made start on that. But I was wondering whether for the convenience of dplyr being used in other packages the current behaviour (i.e. removing VCF class) is necessary?
dplyr
does not know anything about what makes VCF
special and the data frames dplyr
produces are not likely to conform to whatever assumptions VCF
makes.
Well..VCF
just has to have certain colnames of certain datatypes with certain values so a valid VCF
would always be a valid tbl
.
But then I'm not looking to be validated by dplyr
... just ignored really.
As I said I can work around with a line in each function - but I thought it might be worth mentioning as a general issue for other package writers???
c("VCF", "tbl_df", "tbl", "data.frame")
.mutate_.VCF
etcJust wanted to comment that finding this was very helpful. Like Malarkey73, I'm trying to use dplyr verbs in a package. I had been trying to write my own version of the verbs like so (say, for class "sample_report"):
mutate.sample_report(data, …) {
# save class names
# [code]
out = mutate(data, …)
# reapply class names
# [code]
return(out)
}
Which wasn't working, and leading to endless frustration. I never would've guessed I needed that trailing underscore. Replacing "mutate" with "mutate_" works like a charm!
EDIT: It appears ungroup is an exception: you want to use 'ungroup,' not 'ungroup_' (the latter doesn't exist). Just adding for any future googlers.
jwdink, could you clarify which occurrence of "mutate" needed to be replaced with "mutate_"? I see "mutate" twice in your function definition.
@rcorty You'd replace both of them.
So the sequence can be thought of as:
(1) You call mutate on some data frame that has your custom class, which invokes dplyr's "mutate" function
(2) This in turn invokes mutate_, which will recognize your custom class and invoke your custom mutate_ function
(3) Your custom function (using the code above) removes the custom class, and applies dplyr's mutate_ function which will now perform a regular mutate_.data.frame.
(4) Your custom function then takes the output of (3), re-applies the custom class you temporarily removed. This is what gets returned by your custom function.
Hopefully that makes sense.
@jwdink Thanks for your help. I'll add one thing that I figured out in the process of doing this that may be useful for others trying to do something similar. Note that I've removed the most-specific class from vs before applying filter_() to it.
filter_.scanonevar <- function(vs, ...) {
out <- vs
class(out) <- class(out)[-1]
out <- dplyr::filter_(out, ...)
class(out) <- class(vs)
attr(out, 'attr1') <- attr(vs, 'attr1')
return(out)
}
Hi all,
This is a closed issue but I found this conversation very helpful in implementing my own methods for these verbs. @rcorty your code snippet above was very helpful.
Let me try to write out what I learned here after writing up a question about it on SO
#objects with class scanonevar will get this version of filter
filter_.scanonevar <- function(vs, ...) {
out <- vs
#this removes the first class from out (eg the incoming df)
#presumably that removes `scanonevar`
#another way to do this would be
#incoming_classes <- class(vs)
#class(out) <- incoming_classes [!incoming_classes == 'scanonevar']
class(out) <- class(out)[-1]
out <- dplyr::filter_(out, ...)
class(out) <- class(vs)
attr(out, 'attr1') <- attr(vs, 'attr1')
return(out)
}
When I first wrote my custom method for filter
I thought that the dplyr::
in out <- dplyr::filter_(out, ...)
would be enough to ensure dispatch via dplyr's filter... but what actually happened was Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
. Stripping the df of the special class (and then restoring it after dplyr::filter_) solved my problem.
The solutions described here don't seem to work anymore in the 0.7.* release, at least not if your custom-classed object also inherits from tbl_df
.
Any recommendations on how to handle classes now that the underscore_ verbs are deprecated?
Just provide methods for the non-underscored versions
Maybe an adverb is helpful here, since it makes it easy to deal with either:
preservatively <- function(fun, classes_to_keep, attrs_to_keep = NULL) {
function(...) {
the_dots <- eval(substitute(alist(...)))
the_dots[[1]] <- call('remove_class', the_dots[[1]], classes_to_keep)
result <- do.call(fun, args = the_dots, envir = parent.frame())
arg1_res <- eval(the_dots[[1]][[2]], envir = parent.frame())
for (this_attr in attrs_to_keep)
attr(result, this_attr) <- attr(arg1_res, this_attr)
class(result) <- unique(c(classes_to_keep, class(result)))
result
}
}
remove_class <- function(x,to_remove){
class(x) <- setdiff(class(x),to_remove)
x
}
E.g.,
#' @export
slice.my_special_class <- preservatively(slice, classes_to_keep = 'my_special_class', attrs_to_keep = 'my_special_attributes')
#' @export
slice_.my_special_class <- preservatively(slice_, classes_to_keep = 'my_special_class', attrs_to_keep = 'my_special_attributes')
I would avoid any method that doesn't allow you to use NextMethod()
as that will ensure that the correct method is called efficiently.
Oh, that makes sense.
Here's my attempt:
preservatively <- function(fun, attrs_to_keep = NULL) {
function(...) {
result <- NextMethod()
class(result) <- unique(c(.Class[[1]], class(result)))
if (!all(is.element(attrs_to_keep, names(attributes(result))))) {
arg1 <- eval(substitute(alist(...))[[2]], envir = parent.frame())
for (this_attr in attrs_to_keep)
attr(result, this_attr) <- attr(arg1, this_attr)
}
result
}
}
Not sure if it's kosher to use .Class
to access the class method that'll be skipped by NextMethod
, but I wasn't sure there was another way.
I think it would be better to do something more like reclass(NextMethod(), x)
, and then rely on a reclass()
generic to add the correct attributes back on. You shouldn't need to use eval()
here, just use function(x, ...)
.
Thanks. Is this something like what you had in mind?:
#' Copy class and attributes from the original version of an object to a modified version.
#'
#' @param x The original object, which has a class/attributes to copy
#' @param result The modified object, which is / might be missing the class/attributes.
#'
#' @return \code{result}, now with class/attributes restored.
#' @export
reclass <- function(x, result) {
UseMethod('reclass')
}
#' @export
reclass.default <- function(x, result) {
class(result) <- unique(c(class(x)[[1]], class(result)))
attr(result, class(x)[[1]]) <- attr(x, class(x)[[1]])
result
}
#' Modify a function so that it keeps the first class and relevent attributes
#'
#' @param fun A function that usually removes classes and/or attributes
#'
#' @return A modified version of `fun`. The attributes that are retained are determined based on the
#' class of the first argument, according to the relevent \code{reclass} method.
#' @export
preservatively <- function(fun) {
function(x, ...) {
result <- NextMethod()
reclass(x, result)
}
}
The default method assumes the only attribute to copy is one with the same name as the class.
I added an answer to this SO thread that tries to summarize this discussion.
It looks like using function(x, ...)
in preservatively
causes problems when the user passes a named argument and the name differs.
filter.cars <- preservatively(filter)
filter(my_data, condition) # good
filter(.data = my_data, condition) # oh no
Unless I'm missing something, adverb might not work here after all :(
I guess this really isn't any more verbose:
filter.cars <- function(.data, ...) reclass(data, NextMethod())