Dplyr: Column names with spaces or other special characters

Created on 8 Nov 2016  路  18Comments  路  Source: tidyverse/dplyr

Various verbs have issues if column names contain spaces or other non-alphanumeric characters.
They work only if all column names are valid R identifiers.

  • [x] mutate_if
  • [x] mutate_at
  • [x] summarise_if
  • [x] summarise_at
  • [x] select_if
  • [x] rename
  • [x] summarize_all
  • [x] slice
bug

All 18 comments

I may have found a fix for some of this. It involves quoting character vector vars in probe_colwise_names() and select_colwise_names(), which should resolve the _if and _at functions at least.

See this commit in my fork of dplyr:
https://github.com/markriseley/dplyr/commit/6a4d495078c10116968aeba9035b109871ef6d4c

I added a couple of basic tests and ran R CMD check, and checked all the help page examples for summarise_all {dplyr} worked if you changed the column "Petal.Width" to "Petal Width".

I hope this helps, please do more thorough checking, I don't know whether this would cause any issues with databases etc.

Just a bit of experimenting leads to even some verbs showing the bug, others not:

iris$`1_brandNewColumn` <- runif(n = nrow(iris))
iris %>% group_by(Species) %>% summarise_all(mean, trim = 1) ## No bug
# A tibble: 3 脳 6
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width `1_brandNewColumn`
      <fctr>        <dbl>       <dbl>        <dbl>       <dbl>              <dbl>
1     setosa          5.0         3.4         1.50         0.2          0.5055038
2 versicolor          5.9         2.8         4.35         1.3          0.5388621
3  virginica          6.5         3.0         5.55         2.0          0.5186122
> 
iris %>% group_by(Species) %>% summarize_if(is.numeric,mean) ## Bug
Error in parse(text = x) : <text>:1:2: unexpected input
1: 1_
     ^

Not sure if this is related to spaces in the names of the columns variants that are collected in this issue, but I ran into this error when trying to answer this:

```{r}
r1 = c('', 'abc def', '')
r2 = c('1', 'ghi jkl', '2')
r3 = c('', 'mno pqr', '')
df = as.data.frame(rbind(r1, r2, r3), stringsAsFactors = FALSE)

df %>%
mutate_each(
case_when(
grepl("def", .$V2) ~ "x",
grepl("pqr", .$V2) ~ "y",
TRUE ~ .$V2
),
vars(V1, V3)
)

This gives me: 

Error in parse(text = x) : :1:5: unexpected symbol
1: ghi jkl
^
```

@tchakravarty I think . should refer to the current column and case_when() should be wrapped in funs().

@krlmlr Could you give an example for slice() please?

presumably #2224

I think we can close this.

slice() seems to work now:

library(dplyr, warn.conflicts = FALSE)
data_frame("a b" = 1:3) %>% slice(2)
#> # A tibble: 1 脳 1
#>   `a b`
#>   <int>
#> 1     2

I'm not sure this issue can be closed? The following MWE gives an error:

x <- data_frame("a b" = 1.1:3.1)

x %>% mutate_if(is.numeric, round, digits = 0)

It works for me.

Thanks for getting back to me @lionel- that is really strange. I am on dplyr 0.5.0, latest CRAN release, but I get the following error:

Error in parse(text = x) : <text>:1:3: unexpected symbol
1: a b
      ^

Do you get a tibble back? Any ideas on why this might be happening...?

we can't fix issues directly on CRAN, we have to do it in the development version first ;)

Ah - ok, so this will be "fixed" in the next release? I thought you meant it works on 0.5.0 for you. :)

Thanks

@lionel- Did you mean this?

df %>% 
  mutate_at(
    c("V1", "V3"), 
    funs(case_when(
      grepl("def", .$V2) ~ "x",
      grepl("pqr", .$V2) ~ "y",
      TRUE ~ .$V2
    ))
  )

Note that I also switched from using mutate_each to mutate_at. This gives me:

Error in mutate_impl(.data, dots) : object 'V1' not found

The dot refers to the column that is being mapped, not to the data frame:

mutate_at(df, c("V1", "V2", "V3"), funs(case_when(
  grepl("def", .) ~ "x",
  grepl("pqr", .) ~ "y",
  TRUE ~ .
)))
#>  V1      V2 V3
#> 1          x   
#> 2  1 ghi jkl  2
#> 3          y   

@lionel- Got it, thanks. How would I then refer to a different column than the one I am mutateing within case_when?

you could use the new .data pronoun or you could name it directly (here, df).

@lionel- On my machine (Win10), the last statement of this:

r1 = c('', 'abc def', '')
r2 = c('1', 'ghi jkl', '2')
r3 = c('', 'mno pqr', '')
df = as.data.frame(rbind(r1, r2, r3), stringsAsFactors = FALSE)

df %>% 
  mutate_at(
    c("V1", "V3"), 
    funs(case_when(
      grepl("def", .data$V2) ~ "x",
      grepl("pqr", .data$V2) ~ "y",
      TRUE ~ .data$V2
    ))
  )

just hangs & does not return. Fresh dplyr installation off GH.

@tchakravarty: Can't replicate this on my install of Windows 10.

@krlmlr @lionel- Restarting the R session fixes this. Thanks for pointing out the .data pronoun!

Was this page helpful?
0 / 5 - 0 ratings