如何在data.table中按组操作为列编写可重用函数?

use*_*167 1 r data.table

在许多data.tables中我需要一些列(~20),如何在函数中封装操作?

例如,我想要列a1a2每个data.table,最快的方法是复制和粘贴代码:

n= 10
m = 2
d = data.table( p = c(1:n)*1.0, q = 1:m)
dnew = d[, list(a1 = mean(p),a2 = max(p), b = 2) , by = q] #copy and paste
Run Code Online (Sandbox Code Playgroud)

我想写这样的可重用函数,

f <- function(d) with(d, list( a1 = mean(p), a2 = max(p))) #return list
dnew = d[, c(f(.SD), list( b = 2)) , by = q]
Run Code Online (Sandbox Code Playgroud)

或这个,

g <- function(d)d[, list(a1 = mean(p), a2 = max(p)), by = q] #return data.table
dnew1 = g(d)
dnew2 = d[, list(b = 2),by = q]
dnew = merge(dnew1, dnew2, by = "q")
Run Code Online (Sandbox Code Playgroud)

但是,当组数(m)非常大时,两者都非常慢.

Fra*_*ank 5

好吧,你可以按照FAQ 1.6的元编程帮助:

# expression instead of a function
fe = quote(list(a1 = mean(p), a2 = max(p)))

# add another element
e = fe
e$b = 2

# eval following FAQ
d[, eval(e), by=q]
Run Code Online (Sandbox Code Playgroud)

我借用了Hadley Wickham关于表达的注释e$b = 2语法.

这确实有效,但看着d[, eval(e), by=q, verbose=TRUE]我们发现它max没有得到优化.既然b只是一个常数,我会在第二步中添加它:

extrae = quote(`:=`(b = 2))
d[, eval(fe), by=q][, eval(extrae)][]

# or if working interactively...
d[, eval(fe), by=q][, b := 2][]
Run Code Online (Sandbox Code Playgroud)

有了verbose=TRUE,我们现在看到它fe已经优化了list(gmean(p), gmax(p)).