删除在某些列中包含所有 NA 的行

Question

删除在某些列中包含所有 NA 的行

假设您有一个包含 9 列的数据框。您想删除在第 5:9 列中包含所有 NA 的案例。如果第 1:4 列中有 NA，则完全不相关。

到目前为止，我已经找到了允许您删除在任何列 5:9中具有 NAs 的行的函数，但我特别需要仅删除那些在列 5:9中具有所有NAs 的行。

我编写了自己的函数来执行此操作，但由于我有 300k+ 行，因此速度非常慢。我想知道有没有更有效的方法？这是我的代码：

remove.select.na<-function(x, cols){
  nrm<-vector("numeric")
  for (i in 1:nrow(x)){
    if (sum(is.na(x[i,cols]))<length(cols)){
      nrm<-c(nrm,i)
    }
    #Console output to track the progress
    cat('\r',paste0('Checking row ',i,' of ',nrow(x),' (', format(round(i/nrow(x)*100,2), nsmall = 2),'%).'))
    flush.console()
  }
  x<-x[nrm,]
  rm(nrm)
  return(x)
}

Run Code Online (Sandbox Code Playgroud)

其中 x 是数据框， cols 是一个向量，其中包含应检查 NA 的列的名称。

Answer 1

RHe*_*tel 8

此一衬层以除去与NA中的行5和9之间的所有列通过组合rowSums()与is.na()它容易检查在这些5列中的所有条目是否NA：

x <- x[rowSums(is.na(x[,5:9]))!=5,]

Run Code Online (Sandbox Code Playgroud)

Answer 2

sbh*_*bha 6

这里有两个dplyr选项：

library(dplyr)
df <- data_frame(a = c(0, NA, 0, 4, NA, 0, 6), b = c(1, NA, 0, 4, NA, 0, NA), c = c(1, 0, 1, NA, NA, 0, NA))


# columns b and c would be the columns you don't want all NAs

df %>% 
  filter_at(vars(b, c), any_vars(!is.na(.)))

df %>% 
  filter_at(vars(b, c), any_vars(complete.cases(.)))

# A tibble: 5 x 3
      a     b     c
  <dbl> <dbl> <dbl>
1     0     1     1
2    NA    NA     6
3     0     6     1
4     4     4    NA
5     0     0     0

Run Code Online (Sandbox Code Playgroud)

在较新版本中dplyr，使用if_any

df %>% 
      filter(if_any(c(b, c), complete.cases))

Run Code Online (Sandbox Code Playgroud)

很好的解决方案！事实证明，“any_vars()”被“across()”取代。但是，我无法将此处的解决方案转换为依赖“cross()”。有什么提示吗？ (4认同)

Answer 3

GKi*_*GKi 5

您可以使用allwithapply来查找所有值为的行NA：

x[!apply(is.na(x[,5:9]), 1, all),]

Run Code Online (Sandbox Code Playgroud)

或否定is.na并测试any：

x[apply(!is.na(x[,5:9]), 1, any),]

Run Code Online (Sandbox Code Playgroud)

或使用rowSums像@RHertel wher你不需要计算选择的行数：

x[rowSums(!is.na(x[,5:9])) > 0,]

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，6 月前
查看次数：	9902 次
最近记录：	5 年，8 月前