Data.table过滤掉每列中的特定值(按列变化)

Sco*_*hie 2 r data.table

我有一个data.table测量,每列有一个较低的检测限,(可能是检测上限)

set.seed(1)
dt <- data.table(id=1:5, A=rnorm(5), B=rnorm(5, mean=2), C=rnorm(5,mean=-1))
setkey(dt, id)
# "randomly" disperse upper an lower limits to measurement columns
dt[3,A := -5]
dt[2,B := -3]
dt[5,B := 7]
dt[1,C := -10]
dt
   id          A         B           C
1:  1 -0.6264538  1.179532 -10.0000000
2:  2  0.1836433 -3.000000  -0.6101568
3:  3 -5.0000000  2.738325  -1.6212406
4:  4  1.5952808  2.575781  -3.2146999
5:  5  0.3295078  7.000000   0.1249309
Run Code Online (Sandbox Code Playgroud)

我想过滤(设置NA)每列中的值,这些值dt与另一列中列出的下限和上限测量值完全匹配data.table:

limits <- data.table(measurement=LETTERS[1:3], lower=c(-5,-3,-10), 
                     upper=c(NA, 7, NA))
setkey(limits, measurement)
limits
   measurement lower upper
1:           A    -5    NA
2:           B    -3     7
3:           C   -10    NA
Run Code Online (Sandbox Code Playgroud)

我的预期输出是:

dt
   id          A        B          C
1:  1 -0.6264538 1.179532         NA
2:  2  0.1836433       NA -0.6101568
3:  3         NA 2.738325 -1.6212406
4:  4  1.5952808 2.575781 -3.2146999
5:  5  0.3295078       NA  0.1249309
Run Code Online (Sandbox Code Playgroud)

我无法为此构建一个很好的解决方案,所以目前我正在使用一个紧凑的for循环来完成工作:

for (i in 1:nrow(dt)) {
  for (j in 2:ncol(dt)) {
    if (is.na(dt[i, j, with=F])) {
      next
    } else if (dt[i, j, with=F] == limits[names(dt)[j]][, lower]) {
      dt[i, j := NA_real_, with=F]
    } else if (is.na(limits[names(dt)[j]][, upper])) {
      next
    } else if (dt[i, j, with=F] == limits[names(dt)[j]][, upper]) {
      dt[i, j := NA_real_, with=F] 
    } else {
      next
    }   
  }
}
Run Code Online (Sandbox Code Playgroud)

但是必须有更好更好的东西吗?我玩applylimits data.table每一行的每一列dt,但没有任何成功.

Aru*_*run 6

首先,我将您的data.table 转置limiits如下:

require(reshape2)
require(data.table)
limits = dcast.data.table(melt(limits, id=1), variable ~ measurement)

#    variable  A  B   C
# 1:    lower -5 -3 -10
# 2:    upper NA  7  NA
Run Code Online (Sandbox Code Playgroud)

然后,您可以匹配相应的列iNA使用set如下替换这些匹配:

for (i in 2:ncol(dt)) {
    set(dt, i=which(dt[[i]] %in% limits[[i]]), j=i, value=NA_real_)
}

#    id          A        B          C
# 1:  1 -0.6264538 1.179532         NA
# 2:  2  0.1836433       NA -0.6101568
# 3:  3         NA 2.738325 -1.6212406
# 4:  4  1.5952808 2.575781 -3.2146999
# 5:  5  0.3295078       NA  0.1249309
Run Code Online (Sandbox Code Playgroud)