This just broke for me on updating to 0.5.0 (although it is possible that the input data was not grouped before):
d <- data_frame(id = sample(4, 20, replace = TRUE))
# works
d %>%
distinct_(~ id)
# fails
d %>%
group_by_(~ id) %>%
distinct_(~ id)
# Error in as.lazy_dots(.dots) :
# argument ".dots" is missing, with no default
It also drops all columns except for the group_by column when it's used on grouped data (unless the argument .keep_all = TRUE is used). This is different behavior than before I updated dplyr.
df_iris <- iris %>%
group_by(Species)
colnames(df_iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
df_iris <- distinct(df_iris)
colnames(df_iris)
# [1] "Species"
Same type of problem here:
# Does not work. It drops all columns except for the group_by, as ezriah explains
df.temp <- df.micro %>%
group_by(country) %>%
transmute(
Year = mean(yearofinterview, na.rm= T)
) %>%
distinct()
# It works when ungrouping first
df.temp <- df.micro %>%
group_by(country) %>%
transmute(
Year = mean(yearofinterview, na.rm= T)
)
df.temp <- ungroup(df.temp)
a <- df.temp %>%
distinct()
@rgayler: Thanks, confirmed. Would you like to contribute a testthat test?
@ezriah @adeldaoud: This is a breaking change of 0.5.0 but intended behavior.
@krlmlr Could the behaviour with grouped dfs mentioned by @ezriah and @adeldaoud be documented somewhere? It did not become clear to me from ?distinct
or from the release notes for 0.5.0.
Also, I don't like it :-). Now I'll always have to do iris %>% group_by(Species) %>% blabla %>% ungroup() %>% distinct() %>% group_by(Species) %>% blabla
, if I don't want to enumerate all variables?
I'm not too sure anymore that the change for grouped data frames is intended.
@hadley: What should the following code return -- all columns or only Species
?
iris %>% group_by(Species) %>% distinct()
I'd argue that it should be the same as
iris %>% distinct() %>% group_by(Species)
which returns all columns.
Most helpful comment
It also drops all columns except for the group_by column when it's used on grouped data (unless the argument .keep_all = TRUE is used). This is different behavior than before I updated dplyr.