NA/NaN/Inf in data.table 1.9.2

Question

NA/NaN/Inf in data.table 1.9.2

在检查了data.table 1.9.2的新功能之后,我不太清楚操作NA/NaN/Inf的新功能.

新闻:

NA,NaN,+ Inf和-Inf现在被认为是不同的值,可以是键,可以加入并可以分组.data.table定义:NA <NaN <-Inf

我不知道"可以加入并可以分组"是什么意思

DT <- data.table(A=c(NA,NA,1:3), B=c("a",NA,letters[1:3]))

Run Code Online (Sandbox Code Playgroud)

现在我们在A列和B列都有NA,

但我失去了一些如何继续,这个新功能的目的是什么.你能提供一个例子来说明这一点吗？

非常感谢!

Answer 1

mne*_*nel 11

在以前版本的data.table NA, NaN,Inf值中可能存在键,但您不能join或使用二进制扫描以与其他键值一致的方式选择这些行.

请参阅 R中的data.table中的选择NA和NaN的data.table子集不适用于处理这些问题的SO问题的示例(您可以通过data.table项目中的功能请求的答案跟踪历史记录) )

现在,在1.9.2(及以上)中,这样的事情会起作用.

# an example data set
DT <- data.table(A = c(NA,NaN,Inf,Inf,-Inf,NA,NaN,1,2,3), 
              B =letters[1:10], key = 'A')
# selection using binary search
DT[.(Inf)]
#     A B
# 1: Inf c
# 2: Inf d
DT[.(-Inf)]
#       A B
# 1: -Inf e
# note that you need to use the right kind of NA
DT[.(NA_real_)]
#     A B
# 1: NA a
# 2: NA f
DT[.(NaN)]
#      A B
# 1: NaN b
# 2: NaN g
# grouping works
DT[,.N,by=A]
#       A N
# 1:   NA 2
# 2:  NaN 2
# 3: -Inf 1
# 4:    1 1
# 5:    2 1
# 6:    3 1
# 7:  Inf 2

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，9 月前
查看次数：	977 次
最近记录：	11 年，9 月前