子集化data.frame中的NA会出现意外情况

Question

子集化data.frame中的NA会出现意外情况

请考虑以下代码.当您没有NA在您的条件中明确测试时,该代码将在以后失败,然后您的数据会发生变化.

>   # A toy example
>   a <- as.data.frame(cbind(col1=c(1,2,3,4),col2=c(2,NA,2,3),col3=c(1,2,3,4),col4=c(4,3,2,1)))
>   a
  col1 col2 col3 col4
1    1    2    1    4
2    2   NA    2    3
3    3    2    3    2
4    4    3    4    1
>   
>   # Bummer, there's an NA in my condition
>   a$col2==2
[1]  TRUE    NA  TRUE FALSE
> 
>   # Why is this a good thing to do?
>   # It NA'd the whole row, and kept it
>   a[a$col2==2,]
   col1 col2 col3 col4
1     1    2    1    4
NA   NA   NA   NA   NA
3     3    2    3    2
>   
>   # Yes, this is the right way to do it
>   a[!is.na(a$col2) & a$col2==2,]
  col1 col2 col3 col4
1    1    2    1    4
3    3    2    3    2
>     
>   # Subset seems designed to avoid this problem
>   subset(a, col2 == 2)
  col1 col2 col3 col4
1    1    2    1    4
3    3    2    3    2

Run Code Online (Sandbox Code Playgroud)

有人可以解释为什么没有is.na检查你得到的行为会是好的还是有用的？

Answer 1

Sha*_*ane 32

我绝对同意这不直观(我之前在SO上提到过这一点).在为R辩护时,我认为知道何时缺少值是有用的(即这不是错误).的==操作者被明确设计为通知NA或NaN值的用户.有关详细信息,请参阅？"==".它指出:

缺失值('NA')和'NaN'值甚至被认为是不可比的,因此涉及它们的比较将总是导致'NA'.

换句话说,使用二元运算符(因为它是未知的),缺失值是不可比的.

超越is.na(),你也可以这样做:

which(a$col2==2) # tests explicitly for TRUE

Run Code Online (Sandbox Code Playgroud)

要么

a$col2 %in% 2 # only checks for 2

Run Code Online (Sandbox Code Playgroud)

%in%定义为使用match()函数:

'"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'

Run Code Online (Sandbox Code Playgroud)

这也包含在"The R Inferno"中.

在R中检查数据中的NA值至关重要,因为许多重要的操作符都没有按照您的预期处理它.除了==之外,对于诸如&,|,<,sum()之类的东西也是如此.当我写R代码时,我总是在想"如果在这里有一个NA会发生什么".要求R用户小心缺少值是"按设计".

更新:当存在多个逻辑条件时,如何处理NA？

NA是一个逻辑常量,如果您不考虑可能返回的内容(例如NA | TRUE == TRUE),您可能会得到意外的子集.这些真值表 ?Logic可以提供一个有用的说明:

outer(x, x, "&") ## AND table
#       <NA> FALSE  TRUE
#<NA>     NA FALSE    NA
#FALSE FALSE FALSE FALSE
#TRUE     NA FALSE  TRUE

outer(x, x, "|") ## OR  table
#      <NA> FALSE TRUE
#<NA>    NA    NA TRUE
#FALSE   NA FALSE TRUE
#TRUE  TRUE  TRUE TRUE

Run Code Online (Sandbox Code Playgroud)

归档时间：	16 年，7 月前
查看次数：	3764 次
最近记录：	12 年，8 月前