我有一个data.table测量,每列有一个较低的检测限,(可能是检测上限)
set.seed(1)
dt <- data.table(id=1:5, A=rnorm(5), B=rnorm(5, mean=2), C=rnorm(5,mean=-1))
setkey(dt, id)
# "randomly" disperse upper an lower limits to measurement columns
dt[3,A := -5]
dt[2,B := -3]
dt[5,B := 7]
dt[1,C := -10]
dt
id A B C
1: 1 -0.6264538 1.179532 -10.0000000
2: 2 0.1836433 -3.000000 -0.6101568
3: 3 -5.0000000 2.738325 -1.6212406
4: 4 1.5952808 2.575781 -3.2146999
5: 5 0.3295078 7.000000 0.1249309
Run Code Online (Sandbox Code Playgroud)
我想过滤(设置NA)每列中的值,这些值dt与另一列中列出的下限和上限测量值完全匹配data.table:
limits <- data.table(measurement=LETTERS[1:3], lower=c(-5,-3,-10),
upper=c(NA, 7, NA))
setkey(limits, measurement)
limits
measurement lower upper
1: A -5 NA
2: B -3 7
3: C -10 NA
Run Code Online (Sandbox Code Playgroud)
我的预期输出是:
dt
id A B C
1: 1 -0.6264538 1.179532 NA
2: 2 0.1836433 NA -0.6101568
3: 3 NA 2.738325 -1.6212406
4: 4 1.5952808 2.575781 -3.2146999
5: 5 0.3295078 NA 0.1249309
Run Code Online (Sandbox Code Playgroud)
我无法为此构建一个很好的解决方案,所以目前我正在使用一个紧凑的for循环来完成工作:
for (i in 1:nrow(dt)) {
for (j in 2:ncol(dt)) {
if (is.na(dt[i, j, with=F])) {
next
} else if (dt[i, j, with=F] == limits[names(dt)[j]][, lower]) {
dt[i, j := NA_real_, with=F]
} else if (is.na(limits[names(dt)[j]][, upper])) {
next
} else if (dt[i, j, with=F] == limits[names(dt)[j]][, upper]) {
dt[i, j := NA_real_, with=F]
} else {
next
}
}
}
Run Code Online (Sandbox Code Playgroud)
但是必须有更好更好的东西吗?我玩apply了limits data.table每一行的每一列dt,但没有任何成功.
首先,我将您的data.table 转置limiits如下:
require(reshape2)
require(data.table)
limits = dcast.data.table(melt(limits, id=1), variable ~ measurement)
# variable A B C
# 1: lower -5 -3 -10
# 2: upper NA 7 NA
Run Code Online (Sandbox Code Playgroud)
然后,您可以匹配相应的列i并NA使用set如下替换这些匹配:
for (i in 2:ncol(dt)) {
set(dt, i=which(dt[[i]] %in% limits[[i]]), j=i, value=NA_real_)
}
# id A B C
# 1: 1 -0.6264538 1.179532 NA
# 2: 2 0.1836433 NA -0.6101568
# 3: 3 NA 2.738325 -1.6212406
# 4: 4 1.5952808 2.575781 -3.2146999
# 5: 5 0.3295078 NA 0.1249309
Run Code Online (Sandbox Code Playgroud)