如何使用tidyr :: unite函数删除NA?

Pau*_*aul 4 r tidyr

将合并成几列后tidyr::unite(),来自缺失数据的NA仍保留在我的字符向量中,这是我不希望的。

我每行有一系列医疗诊断(每列1个),并希望通过基准测试一系列代码 %in%grepl()

Github上有一个关于此问题的公开问题,是否有任何动静-或解决方法?我想让向量保持逗号分隔。

这是一个代表性的例子:

library(dplyr)
library(tidyr)

df <- data_frame(a = paste0("A.", rep(1, 3)), b = " ", c = c("C.1", "C.3", " "), d = "D.4", e = "E.5")

cols <- letters[2:4]
df[, cols] <- gsub(" ", NA_character_, as.matrix(df[, cols]))
tidyr::unite(df, new, cols, sep = ",")
Run Code Online (Sandbox Code Playgroud)

电流输出:

# # A tibble: 3 x 3
#   a     new        e    
#   <chr> <chr>      <chr>
# 1 A.1   NA,C.1,D.4 E.5  
# 2 A.1   NA,C.3,D.4 E.5  
# 3 A.1   NA,NA,D.4  E.5 
Run Code Online (Sandbox Code Playgroud)

所需的输出:

# # A tibble: 3 x 3
#   a     new        e    
#   <chr> <chr>      <chr>
# 1 A.1   C.1,D.4    E.5  
# 2 A.1   C.3,D.4    E.5  
# 3 A.1   D.4        E.5 
Run Code Online (Sandbox Code Playgroud)

Ron*_*hah 5

If you install the dev version of tidyr you can now add na.rm parameter to drop NAs. The issue is now closed.

devtools::install_github("tidyverse/tidyr")

library(tidyr)
df %>% unite(new, cols, sep = ",", na.rm = TRUE)

#   a     new     e    
#  <chr> <chr>   <chr>
#1 A.1   C.1,D.4 E.5  
#2 A.1   C.3,D.4 E.5  
#3 A.1   D.4     E.5  
Run Code Online (Sandbox Code Playgroud)

You could also use base R apply method for the same.

apply(df[cols], 1, function(x) toString(na.omit(x)))
#[1] "C.1, D.4" "C.3, D.4" "D.4" 
Run Code Online (Sandbox Code Playgroud)

data

df <- data_frame(
a = c("A.1", "A.1", "A.1"),
b = c(NA_character_, NA_character_, NA_character_),
c = c("C.1", "C.3", NA),
d = c("D.4", "D.4", "D.4"),
e = c("E.5", "E.5", "E.5")
)

cols <- letters[2:4]
Run Code Online (Sandbox Code Playgroud)


CT *_*all 4

您可以使用正则表达式在创建 NA 后将其删除:

library(dplyr)
library(tidyr)

df <- data_frame(a = paste0("A.", rep(1, 3)), 
                 b = " ", 
                 c = c("C.1", "C.3", " "), 
                 d = "D.4", e = "E.5")

cols <- letters[2:4]
df[, cols] <- gsub(" ", NA_character_, as.matrix(df[, cols]))
tidyr::unite(df, new, cols, sep = ",") %>% 
     dplyr::mutate(new = stringr::str_replace_all(new, 'NA,?', ''))  # New line
Run Code Online (Sandbox Code Playgroud)

输出:

# A tibble: 3 x 3
  a     new     e    
  <chr> <chr>   <chr>
1 A.1   C.1,D.4 E.5  
2 A.1   C.3,D.4 E.5  
3 A.1   D.4     E.5  
Run Code Online (Sandbox Code Playgroud)