The NA values from all-NA columns get concatenated in as "NA"
strings, as if na.rm = FALSE
. If this is intended, then that's not documented.
library(tidyverse)
data <- tribble(
~Name, ~Postalcode, ~Parent, ~Parent2, ~Parent3,
"Paul", "4732", "Mother", NA, NA,
"Edward", "9045", NA, NA, NA,
"Mary", "3476", "Mother", NA, NA,
NA, NA, NA, NA, NA,
NA, "2468", NA, NA, NA
)
# The NAs from both Parent2 and Parent3 are pasted in as strings, while the NAs
# from Parent1 are properly removed
data %>% unite(Parent_full, Parent:Parent3, sep = "|", na.rm = TRUE)
#> # A tibble: 5 x 3
#> Name Postalcode Parent_full
#> <chr> <chr> <chr>
#> 1 Paul 4732 Mother|NA|NA
#> 2 Edward 9045 NA|NA
#> 3 Mary 3476 Mother|NA|NA
#> 4 <NA> <NA> NA|NA
#> 5 <NA> 2468 NA|NA
# Add a value anywhere in Parent3, and all its NAs get removed, but Parent2 is
# still getting pasted in in the middle
data[[2, "Parent3"]] <- "Uncle"
data %>% unite(Parent_full, Parent:Parent3, sep = "|", na.rm = TRUE)
#> # A tibble: 5 x 3
#> Name Postalcode Parent_full
#> <chr> <chr> <chr>
#> 1 Paul 4732 Mother|NA
#> 2 Edward 9045 NA|Uncle
#> 3 Mary 3476 Mother|NA
#> 4 <NA> <NA> NA
#> 5 <NA> 2468 NA
# Add a value to Parent3, and now there's no columns with all NAs, so no NAs are
# pasted in (also, concatenating all-missing values results in "" instead of an NA)
data[[1, "Parent2"]] <- "Aunt"
data %>% unite(Parent_full, Parent:Parent3, sep = "|", na.rm = TRUE)
#> # A tibble: 5 x 3
#> Name Postalcode Parent_full
#> <chr> <chr> <chr>
#> 1 Paul 4732 Mother|Aunt
#> 2 Edward 9045 Uncle
#> 3 Mary 3476 Mother
#> 4 <NA> <NA> ""
#> 5 <NA> 2468 ""
Created on 2019-09-28 by the reprex package (v0.3.0)
Discovered in answering this SO question: https://stackoverflow.com/questions/58134883/how-to-remove-missing-values-na-when-uniting-columns
TL;DR This occurs for all logical columns. When all vals are NA
the column is parsed as logical.
Here's a reprex of your example above, with intermediate results shown. The columns that are all NA
are different to the some NA
ones in that they're parsed as logical.
So, we could generalize this to say that NA
s are not being removed when you're uniting a logical column containing all NA
s. (The second reprex shows that, if the Parent2 and Parent3 variables are made characters, na.rm
works as expected).
library(tidyverse)
data <- tribble(
~Name, ~Postalcode, ~Parent, ~Parent2, ~Parent3,
"Paul", "4732", "Mother", NA, NA,
"Edward", "9045", NA, NA, NA,
"Mary", "3476", "Mother", NA, NA,
NA, NA, NA, NA, NA,
NA, "2468", NA, NA, NA
)
glimpse(data)
#> Observations: 5
#> Variables: 5
#> $ Name <chr> "Paul", "Edward", "Mary", NA, NA
#> $ Postalcode <chr> "4732", "9045", "3476", NA, "2468"
#> $ Parent <chr> "Mother", NA, "Mother", NA, NA
#> $ Parent2 <lgl> NA, NA, NA, NA, NA
#> $ Parent3 <lgl> NA, NA, NA, NA, NA
# The NAs from both Parent2 and Parent3 are pasted in as strings, while the NAs
# from Parent1 are properly removed
data %>% unite(Parent_full, Parent:Parent3, sep = "|", na.rm = TRUE)
#> # A tibble: 5 x 3
#> Name Postalcode Parent_full
#> <chr> <chr> <chr>
#> 1 Paul 4732 Mother|NA|NA
#> 2 Edward 9045 NA|NA
#> 3 Mary 3476 Mother|NA|NA
#> 4 <NA> <NA> NA|NA
#> 5 <NA> 2468 NA|NA
# Add a value anywhere in Parent3, and all its NAs get removed, but Parent2 is
# still getting pasted in in the middle
data[[2, "Parent3"]] <- "Uncle"
data
#> # A tibble: 5 x 5
#> Name Postalcode Parent Parent2 Parent3
#> <chr> <chr> <chr> <lgl> <chr>
#> 1 Paul 4732 Mother NA <NA>
#> 2 Edward 9045 <NA> NA Uncle
#> 3 Mary 3476 Mother NA <NA>
#> 4 <NA> <NA> <NA> NA <NA>
#> 5 <NA> 2468 <NA> NA <NA>
data %>% unite(Parent_full, Parent:Parent3, sep = "|", na.rm = TRUE)
#> # A tibble: 5 x 3
#> Name Postalcode Parent_full
#> <chr> <chr> <chr>
#> 1 Paul 4732 Mother|NA
#> 2 Edward 9045 NA|Uncle
#> 3 Mary 3476 Mother|NA
#> 4 <NA> <NA> NA
#> 5 <NA> 2468 NA
# Add a value to Parent3, and now there's no columns with all NAs, so no NAs are
# pasted in
data[[1, "Parent2"]] <- "Aunt"
data
#> # A tibble: 5 x 5
#> Name Postalcode Parent Parent2 Parent3
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Paul 4732 Mother Aunt <NA>
#> 2 Edward 9045 <NA> <NA> Uncle
#> 3 Mary 3476 Mother <NA> <NA>
#> 4 <NA> <NA> <NA> <NA> <NA>
#> 5 <NA> 2468 <NA> <NA> <NA>
data %>% unite(Parent_full, Parent:Parent3, sep = "|", na.rm = TRUE)
#> # A tibble: 5 x 3
#> Name Postalcode Parent_full
#> <chr> <chr> <chr>
#> 1 Paul 4732 Mother|Aunt
#> 2 Edward 9045 Uncle
#> 3 Mary 3476 Mother
#> 4 <NA> <NA> ""
#> 5 <NA> 2468 ""
Created on 2019-09-27 by the reprex package (v0.3.0)
library(tidyverse)
data <- tribble(
~Name, ~Postalcode, ~Parent, ~Parent2, ~Parent3,
"Paul", "4732", "Mother", NA, NA,
"Edward", "9045", NA, NA, NA,
"Mary", "3476", "Mother", NA, NA,
NA, NA, NA, NA, NA,
NA, "2468", NA, NA, NA
)
data <- data %>%
mutate(Parent2 = as.character(Parent2),
Parent3 = as.character(Parent3))
data %>% unite(Parent_full, Parent:Parent3, sep = "|", na.rm = TRUE)
#> # A tibble: 5 x 3
#> Name Postalcode Parent_full
#> <chr> <chr> <chr>
#> 1 Paul 4732 Mother
#> 2 Edward 9045 ""
#> 3 Mary 3476 Mother
#> 4 <NA> <NA> ""
#> 5 <NA> 2468 ""
Created on 2019-09-27 by the reprex package (v0.3.0)
Edit:
This also occurs if the column is logical and _not_ all NA
.
data <- tribble(
~Name, ~Postalcode, ~Parent, ~Parent2, ~Parent3,
"Paul", "4732", "Mother", TRUE, NA,
"Edward", "9045", NA, NA, TRUE,
"Mary", "3476", "Mother", NA, NA,
NA, NA, NA, FALSE, NA,
NA, "2468", NA, NA, NA
)
data %>% unite(Parent_full, Parent:Parent3, sep = "|", na.rm = TRUE)
#> # A tibble: 5 x 3
#> Name Postalcode Parent_full
#> <chr> <chr> <chr>
#> 1 Paul 4732 Mother|TRUE|NA
#> 2 Edward 9045 NA|TRUE
#> 3 Mary 3476 Mother|NA|NA
#> 4 <NA> <NA> FALSE|NA
#> 5 <NA> 2468 NA|NA
Maybe this is related:
library(tidyverse)
unite_dbl <- tribble(
~Date, ~First, ~Second,
"2019-01-07", 2.75, NA,
"2019-01-07", NA, 2.5,
"2019-01-08", 0.25, NA,
"2019-01-08", NA, 4.5
)
glimpse(unite_dbl)
#> Observations: 4
#> Variables: 3
#> $ Date <chr> "2019-01-07", "2019-01-07", "2019-01-08", "2019-01-08"
#> $ First <dbl> 2.75, NA, 0.25, NA
#> $ Second <dbl> NA, 2.5, NA, 4.5
unite_dbl %>% unite(col = tmp, 2:3, na.rm = TRUE)
#> # A tibble: 4 x 2
#> Date tmp
#> <chr> <chr>
#> 1 2019-01-07 2.75_NA
#> 2 2019-01-07 NA_2.5
#> 3 2019-01-08 0.25_NA
#> 4 2019-01-08 NA_4.5
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 3.6.1 (2019-07-05)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2019-11-17
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
#> broom 0.5.2 2019-04-07 [1] CRAN (R 3.5.3)
#> callr 3.3.2 2019-09-22 [1] CRAN (R 3.6.1)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.5.0)
#> cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.3)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.1)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1)
#> digest 0.6.22 2019-10-21 [1] CRAN (R 3.6.1)
#> dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.1)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.5.3)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.5.2)
#> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.1)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.3)
#> haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.1)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.5.3)
#> hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.1)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1)
#> jsonlite 1.6 2018-12-07 [1] CRAN (R 3.5.2)
#> knitr 1.26 2019-11-12 [1] CRAN (R 3.6.1)
#> lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.1)
#> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.3)
#> lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.1)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0)
#> modelr 0.1.5 2019-08-08 [1] CRAN (R 3.6.1)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.0)
#> nlme 3.1-142 2019-11-07 [1] CRAN (R 3.6.1)
#> pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.1)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2)
#> purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.1)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
#> readr * 1.3.1 2018-12-21 [1] CRAN (R 3.5.2)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 3.5.3)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1)
#> rlang 0.4.1 2019-10-24 [1] CRAN (R 3.6.1)
#> rmarkdown 1.17 2019-11-13 [1] CRAN (R 3.6.1)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
#> rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.1)
#> scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.1)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.3)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.5.3)
#> testthat 2.3.0 2019-11-05 [1] CRAN (R 3.6.1)
#> tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
#> tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.1)
#> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.1)
#> tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.5.0)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
#> vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
#> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.1)
#> xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1)
#> zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.5.0)
Did I miss something here?
Minimal reprex:
library(tidyr)
df <- tibble(
x = "x",
lgl = NA,
dbl = NA_real_,
chr = NA_character_
)
df %>% unite(out, c("x", "lgl"), na.rm = TRUE) %>% .$out
#> [1] "x_NA"
df %>% unite(out, c("x", "dbl"), na.rm = TRUE) %>% .$out
#> [1] "x_NA"
df %>% unite(out, c("x", "chr"), na.rm = TRUE) %>% .$out
#> [1] "x"
Created on 2019-11-24 by the reprex package (v0.3.0)
Most helpful comment
TL;DR This occurs for all logical columns. When all vals are
NA
the column is parsed as logical.Here's a reprex of your example above, with intermediate results shown. The columns that are all
NA
are different to the someNA
ones in that they're parsed as logical.So, we could generalize this to say that
NA
s are not being removed when you're uniting a logical column containing allNA
s. (The second reprex shows that, if the Parent2 and Parent3 variables are made characters,na.rm
works as expected).Created on 2019-09-27 by the reprex package (v0.3.0)
Created on 2019-09-27 by the reprex package (v0.3.0)
Edit:
This also occurs if the column is logical and _not_ all
NA
.