行对名称中具有特定模式的列进行求和

use*_*852 23 r data.table

我有这样的data.table

dput(DT)
structure(list(ref = c(3L, 3L, 3L, 3L), nb = 12:15, i1 = c(3.1e-05, 
0.044495, 0.82244, 0.322291), i2 = c(0.000183, 0.155732, 0.873416, 
0.648545), i3 = c(0.000824, 0.533939, 0.838542, 0.990648), i4 = c(0.044495, 
0.82244, 0.322291, 0.393595)), .Names = c("ref", "nb", "i1", 
"i2", "i3", "i4"), row.names = c(NA, -4L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000000000320788>)

DT
#    ref nb       i1       i2       i3       i4
# 1:   3 12 0.000031 0.000183 0.000824 0.044495
# 2:   3 13 0.044495 0.155732 0.533939 0.822440
# 3:   3 14 0.822440 0.873416 0.838542 0.322291
# 4:   3 15 0.322291 0.648545 0.990648 0.393595
Run Code Online (Sandbox Code Playgroud)

现在我想计算行总和,但只包括以"i"开头的列("i1","i2"等)

我曾经用来grep创建一个要汇总的列名的向量:

listCol <- colnames(DT)[grep("i", colnames(DT))]
listCol
# [1] "i1" "i2" "i3" "i4"
Run Code Online (Sandbox Code Playgroud)

然后我试图循环列:

DT$sum <- rep.int(0, nrow(DT))
for (i in listCol){
    DT$sum = DT$sum + DT[ , get(i)]
}
Run Code Online (Sandbox Code Playgroud)

...给出了所需的输出:

DT
#    ref nb       i1       i2       i3       i4      sum
# 1:   3 12 0.000031 0.000183 0.000824 0.044495 0.045533
# 2:   3 13 0.044495 0.155732 0.533939 0.822440 1.556606
# 3:   3 14 0.822440 0.873416 0.838542 0.322291 2.856689
# 4:   3 15 0.322291 0.648545 0.990648 0.393595 2.355079
Run Code Online (Sandbox Code Playgroud)

我该如何改进我的代码?


子问题:

这个子问题部分包括前一个问题的答案:

如何避免这种奇怪的表示法:

myrowMeans = function (x){
    rowMeans(x, na.rm = TRUE)
}
DT[ , var := myrowMeans(.SD-myrowMeans(.SD)^2), .SDcols = grep("i", colnames(DT))]
Run Code Online (Sandbox Code Playgroud)

Hug*_*ugh 41

使用.SDcols指定的列,再取rowSums.使用:=指定新的列:

DT[ ,sum := rowSums(.SD), .SDcols = grep("i", names(DT))]
Run Code Online (Sandbox Code Playgroud)


akr*_*run 31

你也可以试试 Reduce

 DT[, Sum := Reduce(`+`, .SD), .SDcols=listCol][]
 #   ref nb       i1       i2       i3       i4      Sum
 #1:   3 12 0.000031 0.000183 0.000824 0.044495 0.045533
 #2:   3 13 0.044495 0.155732 0.533939 0.822440 1.556606
 #3:   3 14 0.822440 0.873416 0.838542 0.322291 2.856689
 #4:   3 15 0.322291 0.648545 0.990648 0.393595 2.355079
Run Code Online (Sandbox Code Playgroud)

注意:如果有"NA"值,则在Reduceie 之前应将其替换为"0"

 DT[, Sum := Reduce(`+`, lapply(.SD, function(x) replace(x, 
                    which(is.na(x)), 0))), .SDcols=listCol][]
Run Code Online (Sandbox Code Playgroud)

**另一种解决方案:**使用 rowSums

 DT[, Sum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("i", names(DT))] 
Run Code Online (Sandbox Code Playgroud)

  • @Frank虽然不确定.我见过使用`rowSums`的data.table专家.`rowSums`的一个优点是在有NA的情况下使用`na.rm = TRUE`.使用Reduce,我们必须在继续使用`+`之前用'0'替换NA. (4认同)
  • @akrun也许是这样的?`DT [,Sum:= rowSums(.SD,na.rm = T),. SDcols = grep("i",names(DT))]` (2认同)