A common case is that one constructs a grouping variable in group_by but only needs it for the duration of the group_by so afterwards one must use select to get rid of it as in the example below. It would be pleasingly symmetric if ungroup could remove the added column just as group_by adds it so
ungroup(-g)
would be the same as
ungroup %>%
select(-g)
Thus in this example taken from https://stackoverflow.com/questions/51939874/referencing-previous-column-value-as-column-is-created/51940343#51940343
test <- structure(list(i = c(0, 1, 2, 3, 4, 0, 1, 2, 3, 4), chng = c(0,
0.031, 0.005, -0.005, 0.017, 0, 0.012, 0.003, -0.013, -0.005),
indx = c(1, 1.031, 1.037, 1.031, 1.048, 1, 1.012, 1.015,
1.002, 0.997)), class = "data.frame", row.names = c(NA, -10L
))
test %>%
group_by(g = cumsum(i == 0)) %>%
mutate(indx = cumprod(chng + 1)) %>%
ungroup %>%
select(-g)
we could write using one fewer statement, i.e. the last two lines of code above are combined into the last line below.
test %>%
group_by(g = cumsum(i == 0)) %>%
mutate(indx = cumprod(chng + 1)) %>%
ungroup(-g)
Note the reduced line count and improved symmetry.
馃 ungroup does have an ... it does not use:
> dplyr:::ungroup.grouped_df
function(x, ...) {
ungroup_grouped_df(x)
}
<bytecode: 0x1026547e8>
<environment: namespace:dplyr>
but I'm not sure about having ungroup also perform selection
Seems to me that incorporating this kind of logic into https://github.com/tidyverse/dplyr/issues/3721 would be the better solution for this use case.
I do think it would be neat if ungroup could selectively remove some groupings but not others, e.g.
mtcars %>% group_by(gear, carb, cyl) %>% ungroup(cyl)
would be equivalent to
mtcars %>% group_by(gear, carb, cyl) %>% group_by(gear, carb)
which is how I first interpreted the title of this issue.
Here is another example taken from https://stackoverflow.com/questions/52906985/merging-of-duplicate-rows-that-have-misspelled-variables/52907932#52907932
library(phonics)
library(dplyr)
# create test data
Lines <- "CAR MPG
Mazda 5
Mazzda 2
Mzda 1"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE, strip.white = TRUE)
# process
DF %>%
group_by(key = soundex(CAR)) %>%
summarize(CAR = toString(CAR), MPG = sum(MPG)) %>%
ungroup %>%
select(-key)
With the feature under discussion this would simplify to the shorter and more symmetric:
```
DF %>%
group_by(key = soundex(CAR)) %>%
summarize(CAR = toString(CAR), MPG = sum(MPG)) %>%
ungroup(-key)
````
@mkoohafkan, The way group_by currently works is that if you want to incrementally add a variable specify group_by(new_var, add = TRUE).
I suppose there is the question of whether add=TRUE means add the variable to the group_by or really means modify the group_by and replace it with a new group_by. In this latter case it would make sense to write group_by(-cyl, add = TRUE) to remove cyl from the group_by while leaving the other group_by variables in effect rather than using ungroup for that.
Another possibility is to use ungroup(cyl, subtract = TRUE) for that analogously to group_by(new_var, add = TRUE).
One other point is that I don't think incrementally adding and removing parts of a group_by is that frequently encountered whereas I have repeated encountered the ungroup %>% select(-var) sequence.
@ggrothendieck thought about this more and I agree with your statements that
ungroup(cyl) to drop the column cyl is symmetric and group_by(-cyl) to remove a column from an existing grouping would be a bit confusing with the existing add argument. If the add argument to group_by had originally been named update this would be syntactically cleaner, e.g. group_by(cyl, update = TRUE) and group_by(-cyl, update = TRUE).ungroup(..., subtract = TRUE) looks like a good idea at first but... what would ungroup(cyl, subtract = FALSE) mean?
group_by() has mutate semantics, not select semantics (c.f. https://dplyr.tidyverse.org/articles/dplyr.html#selecting-operations). I guess you already noticed this when you tried group_by(-cyl, add = TRUE) and saw -cyl became the grouping variable.
dplyr::group_by(mtcars, -cyl)
#> # A tibble: 32 x 12
#> # Groups: -cyl [3]
#> mpg cyl disp hp drat wt qsec vs am gear carb `-cyl`
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 -6
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 -6
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 -4
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 -6
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 -8
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 -6
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 -8
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 -4
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 -4
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 -6
#> # ... with 22 more rows
Created on 2018-10-31 by the reprex package (v0.2.1)
So, to me, ungroup() should have mutate semantics as well for consistency (though I don't know what it means to mutate when ungrouping...). A possible solution is to implement scoped variants for ungroup()? (e.g. ungroup_at())?
Here is another case where this feature could be used taken from https://stackoverflow.com/questions/53240324/dplyr-collapse-tail-rows-into-larger-groups/53240699#53240699
In this case we are manufacturing a sort key in order to keep the table in its original sorted order.
With the feature underdiscussion the select at the end of the code could be combined into the ungroup and so omitted.
Note how this keeps coming up again and again.
df <- tibble(a = as.factor(1:20), b = c(50, 20, 13, rep(2, 10), rep(1, 7)))
df %>%
group_by(sortkey = -b, a = paste0(if_else(b %in% 1:2, "grp", ""), b)) %>%
summarize(b = sum(b)) %>%
ungroup %>%
select(-sortkey)
Having a selective ungroup is also very import when calculating percentages of subgroups.
mtcars %>%
group_by(gear,carb,vs) %>%
summarise(count=n()) %>%
group_by(gear,carb) %>% #<< would be better to do ungroup(cyl)
mutate(perc=count/sum(count)) %>%
ungroup() %>%
spread(vs,perc,sep='=')
gear carb count `vs=0` `vs=1`
<dbl> <dbl> <int> <dbl> <dbl>
1 3 1 3 NA 1
2 3 2 4 1 NA
3 3 3 3 1 NA
4 3 4 5 1 NA
5 4 1 4 NA 1
6 4 2 4 NA 1
7 4 4 2 0.5 0.5
8 5 2 1 0.5 0.5
9 5 4 1 1 NA
I think it would be fine for ungroup() to have select semantics even while group() has action semantics. I'd suggest df %>% ungroup() would continue to work as usual, and df %>% ungroup(x) would remove x from the grouping variables, throwing an error if not currently grouped by x.
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/