Dplyr: Filter on multiple columns without implicitly naming them

Created on 2 Feb 2015  ยท  6Comments  ยท  Source: tidyverse/dplyr

I wonder if there is a possibility to add _each to filter() in order to filter on multiple columns without implicitly naming them (this comes in handy for initial validations on dataframes):

df <- data.frame(replicate(5,sample(1:10,10,rep=TRUE)))

Instead of using:

df %>% filter(X1 >= 2, X2 >= 2, X3 >= 2, X4 >= 2, X5 >= 2)

or

df %>% filter(!rowSums(. < 2))

We could use something like: filter_each(funs(. >= 2))

This would be even more convenient in a situation in which we would like to apply a filter on all columns but one and mimic and hypothetical: filter_each(funs(. >= 2), -X5)

Right now the best altervative we've found on SO is:

df %>% slice(which(!rowSums(select(., -matches('X5')) < 2L)))

or

df %>% filter(!rowSums(.[, !colnames(.) %in% 'X5', drop = FALSE] < 2))

http://stackoverflow.com/questions/28183653/filter-each-column-of-a-data-frame-based-on-a-specific-value

Most helpful comment

There is filter_at(), filter_if() and filter_all() in the dev version.

All 6 comments

Can you please provide a realistic example of when you'd use this?

Lets say I want to filter out the value with the largest difference between it and the column mean, for all columns but X5:

df %>% filter(!X1 == outlier(X5), !X2 == outlier(X2), !X3 == outlier(X3), !X4 == outlier(X4))

I would do something like:

df %>% filter_each(funs(!. == outlier(.)), -X5)

This seems sufficiently esoteric that I don't think it needs to be built into dplyr.

Actually, "filter_each()" function satisfying the above task would be very helpful.
I deal with huge annotation files (Matrix or df) with several columns.And I need to filter the df with "AND" operations on multiple columns.

I would appreciate if you can re-consider to implement this. It will make life easier.

It would be handy if there was a shorthand in dplyr for filtering several columns with the same criteria.

I have a data frame with about 26,000 rows of employee data. Here is an example of a filter that I often perform on the data frame:

  • Filter to rows where employees are at level 3A (RESPLEVEL), in the specified institutions who have oversight of any of the specified cost centres (ccentre).
  • Some employees can have up to 6 cost centres assigned to them and these are stored in the fields CCENTRE1, CCENTRE2, CCENTRE3, CCENTRE4, CCENTRE5, CCENTRE6.
  • dat is a wide data frame with many other columns.

universities <- c(1:100, 110:120) ccentres <- c(โ€œ133โ€, โ€œ133aโ€, โ€œ133bโ€, โ€œ133cโ€, โ€œ133dโ€, โ€œ130โ€, โ€œ135โ€) datfiltered <- dat %>% filter( RESPLEVEL == โ€œ3Aโ€, INSTITUTIONID %in% universities, CCENTRE1 %in% ccentres | CCENTRE2 %in% ccentres | CCENTRE3 %in% ccentres | CCENTRE4 %in% ccentres | CCENTRE5 %in% ccentres | CCENTRE6 %in% ccentres )

There is filter_at(), filter_if() and filter_all() in the dev version.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Prometheus77 picture Prometheus77  ยท  4Comments

steromano picture steromano  ยท  4Comments

JohnMount picture JohnMount  ยท  4Comments

ggrothendieck picture ggrothendieck  ยท  4Comments

profdave picture profdave  ยท  3Comments