Dplyr: Bad error for nonexistent column that happens to be named 'id'

Created on 16 Apr 2020 · 9Comments · Source: tidyverse/dplyr

Discovered in actual usage where I had recently had (but renamed) a column named id.

I think the missing column id should generate the same error as foo, not delegate to a vctrs error.

library(dplyr)
#>     intersect, setdiff, setequal, union
packageVersion("dplyr")
#> [1] '0.8.99.9002'

dat <- tibble(x = 1)

select(dat, foo)
#> Error: Can't subset columns that don't exist.
#> x Column `foo` doesn't exist.

select(dat, id)
#> Error: `id()` was deprecated in dplyr 0.5.0 and is now defunct.
#> Please use `vctrs::vec_group_id()` instead.

^{Created on 2020-04-16 by the reprex package (v0.3.0.9001)}

Source

jennybc

Most helpful comment

I just realized this is sort of our own version of “object of type closure is not subsettable”.

Unfortunately I think some of these problematic (non) variable names do come up a lot (id, data).

jennybc on 17 Apr 2020

😄3

All 9 comments

I don't think we can do much about this. @lionel- might have some ideas.

hadley on 16 Apr 2020

This is by design because we thought it's more important to have a simple syntax for predicate selection than complete unambiguity regarding data frame columns. Maybe it's still time to reconsider before dplyr 1.0 if turns out it's going to be confusing. How likely is it that column names clash with existing functions?

For programming we have an unambiguous alternative: all_of("foo") or force(id).

lionel- on 17 Apr 2020

Can maybe id be given a class that instructs tidyselect to ignore it ?

romainfrancois on 17 Apr 2020

I think we need a general strategy here. Another example, users might expect a column data after some tidyr transformation. If it's not there for some reason, tidyselect will find utils::data instead and try to interpret it as a predicate.

lionel- on 17 Apr 2020

I wonder if we could offer a hint here? i.e. if when we execute a function and it errors, add a hint that's something like "using function id since there's no variable id in your data frame?`.

hadley on 17 Apr 2020

I just realized this is sort of our own version of “object of type closure is not subsettable”.

Unfortunately I think some of these problematic (non) variable names do come up a lot (id, data).

jennybc on 17 Apr 2020

😄3

Other thing to consider regarding the syntax: some users expect to use lambda formulas as predicate:
https://github.com/r-lib/tidyselect/issues/187
https://stackoverflow.com/questions/61283841/cant-use-purrr-style-lambda-function-with-select-dplyr-1-0-0-dev.

This is relevant because one alternative is to use a function constructor to solve the ambiguity:

data %>% select(fn(~ is_numeric(.)))

# With hypothetical purrr function operators for creating union or intersection of predicates
data %>% select(or(~ is.numeric(.), ~ is.character(.)))
data %>% select(or(is.numeric, is.character))

Edit: Or we just add support for formulas but then this might be confusing regarding precedence of | and &, they'll be part of the predicate function instead of joining them. E.g. ~ is.numeric(.) | ~ is.factor(.) is equivalent to ~ (is.numeric(.) | ~ is.factor(.)).

lionel- on 18 Apr 2020

Just had this confusing error with pivot_longer():

Error: This tidyselect interface doesn't support predicates yet.
ℹ Contact the package author and suggest using `eval_select()`.

I was trying to pivot a variable mean which didn't exist. pivot_longer() doesn't support predicates yet, but if it did, it would have run mean() on the variables as if it were a predicate.

If we keep the simple syntax for predicates, we should at least improve the error message when the function returns numeric values:

iris %>% select(function(x) 1L)
#> Error: Can't coerce element 1 from a integer to a logical

lionel- on 19 Apr 2020

Now tracking in https://github.com/r-lib/tidyselect/issues/190

hadley on 25 Apr 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Strange behavior when filtering tibble containing lubridate column

NightWinkle · 3Comments

dplyr rename_all does not work on grouped data

JohnMount · 3Comments

case_when not handling is.na statements

tjmahr · 4Comments

dplyr 0.6.0 join problem with CRAN version of sparklyr 0.5.5

JohnMount · 4Comments

summarise_at using different functions for different variables

profdave · 3Comments