Dplyr: distinct_ requires .dots argument with grouped input

Created on 29 Jun 2016  路  5Comments  路  Source: tidyverse/dplyr

This just broke for me on updating to 0.5.0 (although it is possible that the input data was not grouped before):

d <- data_frame(id = sample(4, 20, replace = TRUE))

# works
d %>%
  distinct_(~ id)

# fails
d %>%
  group_by_(~ id) %>%
  distinct_(~ id)
# Error in as.lazy_dots(.dots) : 
#   argument ".dots" is missing, with no default
bug

Most helpful comment

It also drops all columns except for the group_by column when it's used on grouped data (unless the argument .keep_all = TRUE is used). This is different behavior than before I updated dplyr.

df_iris <- iris  %>%
  group_by(Species) 

colnames(df_iris)
# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

df_iris <- distinct(df_iris)

colnames(df_iris)
# [1] "Species"

All 5 comments

It also drops all columns except for the group_by column when it's used on grouped data (unless the argument .keep_all = TRUE is used). This is different behavior than before I updated dplyr.

df_iris <- iris  %>%
  group_by(Species) 

colnames(df_iris)
# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

df_iris <- distinct(df_iris)

colnames(df_iris)
# [1] "Species"

Same type of problem here:

# Does not work. It drops all columns except for the group_by, as ezriah explains

df.temp <- df.micro %>%
  group_by(country) %>%
  transmute(
     Year = mean(yearofinterview, na.rm= T)
    ) %>%
  distinct()

# It works when ungrouping first

df.temp <- df.micro %>%
  group_by(country) %>%
  transmute(
       Year = mean(yearofinterview, na.rm= T)
    ) 

df.temp <- ungroup(df.temp)

a <- df.temp %>%
  distinct()

@rgayler: Thanks, confirmed. Would you like to contribute a testthat test?

@ezriah @adeldaoud: This is a breaking change of 0.5.0 but intended behavior.

@krlmlr Could the behaviour with grouped dfs mentioned by @ezriah and @adeldaoud be documented somewhere? It did not become clear to me from ?distinct or from the release notes for 0.5.0.

Also, I don't like it :-). Now I'll always have to do iris %>% group_by(Species) %>% blabla %>% ungroup() %>% distinct() %>% group_by(Species) %>% blabla, if I don't want to enumerate all variables?

I'm not too sure anymore that the change for grouped data frames is intended.

@hadley: What should the following code return -- all columns or only Species?

iris %>% group_by(Species) %>% distinct()

I'd argue that it should be the same as

iris %>% distinct() %>% group_by(Species)

which returns all columns.

Was this page helpful?
0 / 5 - 0 ratings