Dplyr: Filter on multiple columns without implicitly naming them

Created on 2 Feb 2015 · 6Comments · Source: tidyverse/dplyr

I wonder if there is a possibility to add _each to filter() in order to filter on multiple columns without implicitly naming them (this comes in handy for initial validations on dataframes):

df <- data.frame(replicate(5,sample(1:10,10,rep=TRUE)))

Instead of using:

df %>% filter(X1 >= 2, X2 >= 2, X3 >= 2, X4 >= 2, X5 >= 2)

df %>% filter(!rowSums(. < 2))

We could use something like: filter_each(funs(. >= 2))

This would be even more convenient in a situation in which we would like to apply a filter on all columns but one and mimic and hypothetical: filter_each(funs(. >= 2), -X5)

Right now the best altervative we've found on SO is:

df %>% slice(which(!rowSums(select(., -matches('X5')) < 2L)))

df %>% filter(!rowSums(.[, !colnames(.) %in% 'X5', drop = FALSE] < 2))

http://stackoverflow.com/questions/28183653/filter-each-column-of-a-data-frame-based-on-a-specific-value

Source

voxnonecho

Most helpful comment

There is filter_at(), filter_if() and filter_all() in the dev version.

lionel- on 11 Apr 2017

👍3

All 6 comments

Can you please provide a realistic example of when you'd use this?

hadley on 4 Feb 2015

Lets say I want to filter out the value with the largest difference between it and the column mean, for all columns but X5:

df %>% filter(!X1 == outlier(X5), !X2 == outlier(X2), !X3 == outlier(X3), !X4 == outlier(X4))

I would do something like:

df %>% filter_each(funs(!. == outlier(.)), -X5)

voxnonecho on 4 Feb 2015

This seems sufficiently esoteric that I don't think it needs to be built into dplyr.

hadley on 19 May 2015

Actually, "filter_each()" function satisfying the above task would be very helpful.
I deal with huge annotation files (Matrix or df) with several columns.And I need to filter the df with "AND" operations on multiple columns.

I would appreciate if you can re-consider to implement this. It will make life easier.

sentisci on 5 Oct 2016

It would be handy if there was a shorthand in dplyr for filtering several columns with the same criteria.

I have a data frame with about 26,000 rows of employee data. Here is an example of a filter that I often perform on the data frame:

Filter to rows where employees are at level 3A (RESPLEVEL), in the specified institutions who have oversight of any of the specified cost centres (ccentre).
Some employees can have up to 6 cost centres assigned to them and these are stored in the fields CCENTRE1, CCENTRE2, CCENTRE3, CCENTRE4, CCENTRE5, CCENTRE6.
dat is a wide data frame with many other columns.

universities <- c(1:100, 110:120) ccentres <- c(“133”, “133a”, “133b”, “133c”, “133d”, “130”, “135”) datfiltered <- dat %>% filter( RESPLEVEL == “3A”, INSTITUTIONID %in% universities, CCENTRE1 %in% ccentres | CCENTRE2 %in% ccentres | CCENTRE3 %in% ccentres | CCENTRE4 %in% ccentres | CCENTRE5 %in% ccentres | CCENTRE6 %in% ccentres )