我有一个数据数据和一个拟合系数的数据表.我想计算每一行的拟合值.
dt = data.table(a = rep(c("x","y"), each = 5), b = rnorm(10), c = rnorm(10), d = rnorm(10))
coefs = data.table(a = c("x","y"), b = c(0, 1), d = c(2,3))
dt
# a b c d
# 1: x -0.25174915 -0.2130797 -0.67909764
# 2: x -0.35569766 0.6014930 0.35201386
# 3: x -0.31600957 0.4398968 -1.15475814
# 4: x -0.54113762 -2.3497952 0.64503654
# 5: x 0.11227873 0.0233775 -0.96891456
# 6: y 1.24077566 -1.2843439 1.98883516
# 7: y -0.23819626 0.9950835 -0.17279980
# 8: y 1.49353589 0.3067897 -0.02592004
# 9: y 0.01033722 -0.5967766 -0.28536224
#10: y 0.69882444 0.8702424 1.24131062
coefs # NB no "c" column
# a b d
#1: x 0 2
#2: y 1 3
Run Code Online (Sandbox Code Playgroud)
对于a=="x"dt中的每一行,我想要0*b+2*d; 对于a=="y"dt中的每一行,我想要1*b+3*d.
有没有数据表的方法来做这个没有硬编码列名称?我很乐意将列名放在一个变量中cols = colnames(coefs)[-1].
很容易循环组和rbind一起,所以如果分组造成麻烦,请忽略该部分.
Rol*_*and 10
加入data.tables:
dt[coefs, res := b * i.b + d * i.d, on = "a"]
# a b c d res
#1: x 0.09901786 -0.362080111 -0.5108862 -1.0217723
#2: x -0.16128422 0.169655945 0.3199648 0.6399295
#3: x -0.79648896 -0.502279345 1.3828633 2.7657266
#4: x -0.26121421 0.480548972 -1.1559392 -2.3118783
#5: x 0.54085591 -0.601323442 1.3833795 2.7667590
#6: y 0.83662761 0.607666970 0.6320762 2.7328562
#7: y -1.92510391 -0.050515610 -0.3176544 -2.8780671
#8: y 1.65639926 -0.167090105 0.6830158 3.7054466
#9: y 1.48772354 -0.349713539 -1.2736467 -2.3332166
#10: y 1.49065993 0.008198885 -0.1923361 0.9136516
Run Code Online (Sandbox Code Playgroud)
通常你会在这里使用矩阵乘积,但这意味着你必须将相应的子集强制转换为矩阵.这将导致复制,并且由于data.tables主要用于更大的数据,因此您希望避免复制.
如果您需要动态列名,那么我想到的最简单的解决方案实际上是eval/ parseconstruct:
cols = colnames(coefs)[-1]
expr <- parse(text = paste(paste(cols, paste0("i.", cols), sep = "*"), collapse = "+"))
#expression(b*i.b+d*i.d)
dt[coefs, res := eval(expr), on = "a"]
Run Code Online (Sandbox Code Playgroud)
也许其他人可以提出更好的解决方案.
这是一个使用矩阵乘法的解决方案:
dt[, res := as.matrix(.SD) %*% unlist(coefs[a == .BY, .SD, .SDcols = cols]),
by = "a", .SDcols = cols]
Run Code Online (Sandbox Code Playgroud)
当然,这会产生副本,这可能比eval解决方案效率低.