Dplyr: grouped_df internal structure: okay to use?

Created on 15 Feb 2018  路  9Comments  路  Source: tidyverse/dplyr

I'm writing some code that will operate on groups in a data frame, but I don't want to use the standard group_by() %>% foo() paradigm for reasons. Instead, I'd like to leverage how group_by identifies the indices for each group, and use those indices within my code.

Example:

g <- group_by(mtcars, gear)

attr(g, "indices")  # rows in mtcars corresponding to each value of gear
#[[1]]
# [1]  3  4  5  6 11 12 13 14 15 16 20 21 22 23 24
#
#[[2]]
# [1]  0  1  2  7  8  9 10 17 18 19 25 31
#
#[[3]]
#[1] 26 27 28 29 30

Can I assume that the indices attribute won't change in a later version of group_by, at least for the foreseeable future?

feature

All 9 comments

Perhaps you can use group_indices(g), group_indices(mtcars, gear), split(1:nrow(mtcars), group_indices(mtcars, gear)), or some variant here if possible?

Thanks, @Ax3man, using the API is indeed preferred. @Hong-Revo: Do you feel the current API satisfies your needs?

I can get the same result via base R's split, but it's really slow and memory-hungry with lots of tiny groups. Dplyr's group_by is super efficient by comparison.

This is only an internal project for now, so I think I'll be okay with sticking to group_by even though it's undocumented. Perhaps consider exposing the indices in a future update?

I now see that group_indices() is insufficient. We should perhaps implement an accessor that returns both the "labels" and the "indices" attributes as a nested tibble.

@Ax3man: Looks like a similar problem, perhaps tidyr will be a user of this feature here.

From a purely selfish perspective, I'd prefer to see this as a feature of dplyr as opposed to tidyr. This is to avoid increasing the number of dependencies of my package. dplyr is pretty heavyweight already, I'd rather not add to it.

It seems that it would also be fairly easy to implement as part of dplyr. All the hard work has already been done, it's just a matter of documenting it and exporting an accessor function.

I would not count on these attributes. The structure might change, see e.g. #3489

This is more a question than an issue, so I'll close it. Perhaps you can open a discussion in community.rstudio.com

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

Was this page helpful?
0 / 5 - 0 ratings