Dplyr: [Feature Request] Argument enabling rename function to pass even without matching a column name.

Created on 18 Nov 2017  路  10Comments  路  Source: tidyverse/dplyr

For example, data.frame %>% rename( name=ID) %>% write_csv()

The desired behavior of rename() with new argument would be:
If the ID column exists in the date.frame, rename it and pass it to write_csv().
If the ID column do NOT exist, rename() does nothing and pass the original data.frame to write_csv().

Right now, it throws errors if rename() cannot find a match. In that way, data.frame would not be passed to the downstream function.

Most helpful comment

This particular issue of SYLK file with excel is reference in microsoft support with a workaround:

Workaround
To open your file in Excel, open the file in a text editor, and then insert an apostrophe at the beginning of the first line of text.

As it is specific to writing csv to excel, maybe it should be a functionality inside the readr::write_excel_csv : if first column name is ID then replace by 'ID or just replace by lower case id (to not have an apostrophe in excel file)

All 10 comments

Understandable for your use case but seems a bit dangerous in most cases. How do I know if I made a mistake in renaming my column if it do not throw me an error ?
I see the error throughing like an indicator of a typo or a column mistake.
Maybe a warning could be enough....

However, know that purrr function can help you in this kind of desired behaviour. with possibly you can catch error and return another value in that case. purrr is part of the tidyverse

reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2017-11-18

library(dplyr, warn.conflicts = F)

starwars %>%
  glimpse()
#> Observations: 87
#> Variables: 13
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", ...
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 8...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "b...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "l...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue",...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0...
#> $ gender     <chr> "male", NA, NA, "male", "female", "male", "female",...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alder...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human...
#> $ films      <list> [<"Revenge of the Sith", "Return of the Jedi", "Th...
#> $ vehicles   <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>,...
#> $ starships  <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Adva...

starwars %>%
  # birth_month does not exist
  rename(month_of_birth = birth_month) %>%
  glimpse()
#> Error: `birth_month` contains unknown variables

starwars %>%
  # birth_month does not exist but no more error in the pipeline
  # possibly helps catch the error and returns here the input df if error
  purrr::possibly(rename, otherwise = .)(month_of_birth = birth_month) %>%
  glimpse()
#> Observations: 87
#> Variables: 13
#> $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", ...
#> $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188...
#> $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 8...
#> $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "b...
#> $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "l...
#> $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue",...
#> $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0...
#> $ gender     <chr> "male", NA, NA, "male", "female", "male", "female",...
#> $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alder...
#> $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human...
#> $ films      <list> [<"Revenge of the Sith", "Return of the Jedi", "Th...
#> $ vehicles   <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>,...
#> $ starships  <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Adva...

With this you get the behaviour you are looking for and you have to explicitly set this special behaviour which I find to be a good practive.

Hopes it helps.

Thank you so much for the help. I completely agree that it could be dangerous. That is why I recommend adding it as an extra argument instead of default. The user would be aware of that risk when he adds the argument. I like the purrr workaround, however it is not as neat as modifying the original rename function. Hopefully, the developer would consider the option, which gives the user more flexibilities.

@sibojan: Thanks. What is your use case for this functionality?

Microsoft Excel has issues processing CSV file with ID as the first column name (https://annalear.ca/2010/06/10/why-excel-thinks-your-csv-is-a-sylk/). In R, I always convert ID to id or something else before a CSV file is fed to Excel. So, my workflow is data.frame %>% dplyr::rename( ID = id ) %>% readr::write_csv() to a CSV file, which will then be open by Microsoft Excel. However, if a dataframe has no ID column, it is supposed to be passed to Excel directly without changing anything by rename().

sylk-2
Thanks,

I'm not sure exactly how the excel bit ties in (I'm not doubting you, just don't follow). Wouldn't it be more of a write_csv-related functionality in this case?

This particular issue of SYLK file with excel is reference in microsoft support with a workaround:

Workaround
To open your file in Excel, open the file in a text editor, and then insert an apostrophe at the beginning of the first line of text.

As it is specific to writing csv to excel, maybe it should be a functionality inside the readr::write_excel_csv : if first column name is ID then replace by 'ID or just replace by lower case id (to not have an apostrophe in excel file)

Thank you for you help. Unfortunately, addition of the left single quote sign in the CSV file would cause new issues with some business software in my area. Those are close-source software and hard to change. R is expected to be the most adaptive and universal tools.

So, what would this pipeline look like, then? You'd like to be able to rename(name = ID) to/from csv with invisible failure/no change if ID does not exist into/out of Excel and/or the proprietary software? Does the lower case id, as @cderv suggests, also disrupt this pipeline? What would the "extra argument instead of default" be?

I still think this should be the responsibility of write_excel_csv(). Would you mind filing an issue with the readr repo?

I agree. However, write_excel_csv() includes a UTF-8 Byte order mark, which causes new problem in some settings.

Was this page helpful?
0 / 5 - 0 ratings