Dplyr: select_ does not allow for spaces in column name, filter_ does not seem to work

Created on 8 Sep 2015  路  5Comments  路  Source: tidyverse/dplyr

These are two separate issues, but I think they may have the same underlying code affecting them both. First, with select_, I cannot use spaces in the name:

> mtcars_tbl <- tbl_df(mtcars)
> mtcars_tbl <- rename(mtcars_tbl, `miles per gallon` = mpg)
> select_(mtcars_tbl, "miles per gallon")

Error in parse(text = x) : <text>:1:7: unexpected symbol
1: miles per

Same thing if I actually use the variable, as "intended".

> tmp <- "miles per gallon"
> select_(mtcars_tbl, tmp)

Error in parse(text = x) : <text>:1:7: unexpected symbol
1: miles per

So, whether I pass an actual variable with a string in it, as I presume was the original intent, or if I just want a cleaner way to deal with spaces, select_ still fails. filter_ (and presumably others) also fails:

> filter_(mtcars_tbl, "miles per gallon")
Error in parse(text = x) : <text>:1:7: unexpected symbol
1: miles per

OK, now onto problem 2:

filter_ does not seem to work at all, even when this whitespace is not an issue. For example:

> filter_(mtcars_tbl, "cyl" > 4) %>% arrange(cyl) %>% head(2)
  miles per gallon cyl  disp hp drat   wt  qsec vs am gear carb
1             22.8   4 108.0 93 3.85 2.32 18.61  1  1    4    1
2             24.4   4 146.7 62 3.69 3.19 20.00  1  0    4    2

versus

> filter(mtcars_tbl, cyl > 4) %>% arrange(cyl) %>% head(2)
  miles per gallon cyl disp  hp drat    wt  qsec vs am gear carb
1               21   6  160 110  3.9 2.620 16.46  0  1    4    4
2               21   6  160 110  3.9 2.875 17.02  0  1    4    4

I played around with it some more, looking at the dim(), etc. It's pretty clear that nothing is happening with filter_. :frowning:

Most helpful comment

For problem one, you want:

tmp <- "miles per gallon"
select_(mtcars_tbl, as.name(tmp))

All 5 comments

About problem 2, this is user error. You can either use "cyl >4" or ~cyl > 4 , but what happens here is that gets evaluated:

> "cyl" > 4
[1] TRUE

so you get all the data back.

For problem one, you want:

tmp <- "miles per gallon"
select_(mtcars_tbl, as.name(tmp))

Romain,

Thank you for catching my error! I guess when I saw that one thing didn't work, I started suspecting the package more than my code.

Hadley,

Thank you for your solution. This will work in the short term, but is it a long term solution? What is the harm of wrapping everything with "as.name" automatically in the "*_" flavors of the package? Does that make something else crash somewhere else?

As I'd mentioned, there are 2 reasons to use the "*_" flavors:

  • to avoid having to deal with backticks or other syntax that I'd rather avoid. Admittedly, this is the less urgent reason.
  • to be able to use the names as variables, the main reason. If "as.name" works for some variable name constructs, and not for others, then this solution is not viable. If it works for all variable name constructs, then can't it just be pulled in as part of the *_ flavors of the functions and methods within dplyr itself?

It's a fundamental choice - should "my weird variable" work, or should "starts_with('abc') work. I decided on the latter, and it's too late to change now. as.name() says that what you have is a name of a variable, which seems pretty reasonable to me.

Hi Hadley,

OK, understood! So there are unfortunately inevitable conflicts in how names are interpreted. I understand that, as it has been the bane of R for many years.

I guess I was just hoping that with all the marvelous things that dplyr, tidyr, ggplot2, etc. have been able to manage, despite relying upon a language that has made some poor choices under the hood, that maybe you found some cool magical way to resolve this issue, too.

Thanks for your response, and all your amazing packages!

Was this page helpful?
0 / 5 - 0 ratings