Dplyr: selecting vars with `starts_with`, `ends_with`, `contains` and `matches` return wrong result when given pattern does not exist

Created on 15 Jul 2014  路  3Comments  路  Source: tidyverse/dplyr

Here is the behaviour:

> d <- tbl_df(data.frame(xxx = 1:2, yyy = 1:2, bxx = 1:2, bbb = 1:2))
> d %>% select(starts_with('nonsense'))
Source: local data frame [2 x 4]

  xxx yyy bxx bbb
1   1   1   1   1
2   2   2   2   2
> d %>% select(ends_with('nonsense'))
Source: local data frame [2 x 4]

  xxx yyy bxx bbb
1   1   1   1   1
2   2   2   2   2
> d %>% select(matches('nonsense'))
Source: local data frame [2 x 4]

  xxx yyy bxx bbb
1   1   1   1   1
2   2   2   2   2
> d %>% select(contains('nonsense'))
Source: local data frame [2 x 4]

  xxx yyy bxx bbb
1   1   1   1   1
2   2   2   2   2

Clearly the select function should not return all columns in the dataframe. It should either throw an error with a helpful message or return an empty dataframe. I am not sure which would be preferable.

From what I can see the problem is in the list of functions called select_funs in the select_vars_q function. One would have to catch the error in there and decide what to return accordingly. I would be happy to submit a pull request but don't really want to do work without hearing what you think the most appropriate return value would be :)

bug

Most helpful comment

I'm not sure if this is a related issue, but the following seems inconsistent to me:

> data_frame(a=1, ba=1) %>%  select(starts_with("a"), ends_with("b")) %>% names
character(0)
> 
> data_frame(a=1, ab=1) %>%  select(starts_with("a"), ends_with("b")) %>% names
[1] "a"  "ab"

All 3 comments

I think it should throw an error, something like "Failed to select any columns". It also needs to handle the case like select(mtcars, -(mpg:carb)).

Cool. I'll send a pull request in a week or so. On holiday atm.

I'm not sure if this is a related issue, but the following seems inconsistent to me:

> data_frame(a=1, ba=1) %>%  select(starts_with("a"), ends_with("b")) %>% names
character(0)
> 
> data_frame(a=1, ab=1) %>%  select(starts_with("a"), ends_with("b")) %>% names
[1] "a"  "ab"
Was this page helpful?
0 / 5 - 0 ratings