jja*_*jap 12 r aggregation plyr data.table
我的聚合需求因列/ data.frames而异.我想动态地将"list"参数传递给data.table.
作为一个最小的例子:
require(data.table)
type <- c(rep("hello", 3), rep("bye", 3), rep("ok",3))
a <- (rep(1:3, 3))
b <- runif(9)
c <- runif(9)
df <- data.frame(cbind(type, a, b, c), stringsAsFactors=F)
DT <-data.table(df)
Run Code Online (Sandbox Code Playgroud)
这个电话:
DT[, list(suma = sum(as.numeric(a)), meanb = mean(as.numeric(b)), minc = min(as.numeric(c))), by= type]
Run Code Online (Sandbox Code Playgroud)
会有类似的结果:
type suma meanb minc
1: hello 6 0.1332210 0.4265579
2: bye 6 0.5680839 0.2993667
3: ok 6 0.5694532 0.2069026
Run Code Online (Sandbox Code Playgroud)
未来的data.frames将有更多的列,我想要以不同的方式进行总结.但是为了使用这个小例子:有没有办法以编程方式传递列表?
我天真地尝试过:
# create a different list
mylist <- "list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))"
# new call
DT[, mylist, by=type]
Run Code Online (Sandbox Code Playgroud)
出现以下错误:
1: hello
2: bye
3: ok
mylist
1: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
2: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
3: list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
Run Code Online (Sandbox Code Playgroud)
任何提示赞赏!最好的祝福!
PS抱歉这些as.numeric(),我无法弄清楚原因,但我需要它们来运行示例.
次要编辑在初始句子中的data.frame之前插入列/以阐明我的需要.
mne*_*nel 10
这是解释FAQ 1.6 您正在寻找的是quote和eval
就像是
mycall <- quote(list(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c))))
DT[, eval(mycall)]
Run Code Online (Sandbox Code Playgroud)
经过一些头脑冲击之后,这是构建ddply使用调用的一种非常难看的方式 .()
myplyrcall <- .(lengtha = length(as.numeric(a)), maxb = max(as.numeric(b)), meanc = mean(as.numeric(c)))
do.call(ddply,c(.data = quote(DF), .variables = 'type',.fun = quote(summarise),myplyrcall))
Run Code Online (Sandbox Code Playgroud)
您还可以使用as.quoted哪个as.quoted.character方法来构造使用paste0
myplc <-as.quoted(c("lengtha" = "length(as.numeric(a))", "maxb" = "max(as.numeric(b))", "meanc" = "mean(as.numeric(c))"))
Run Code Online (Sandbox Code Playgroud)
这也可以与data.table一起使用!
dtcall <- as.quoted(mylist)[[1]]
DT[,eval(dtcall), by = type]
Run Code Online (Sandbox Code Playgroud)
data.table 一路走来.
另一种方法(支持使用paste或paste0构建表达式):
expr <- parse(text=mylist)
DT[, eval( expr ), by=type]
#-------
type lengtha maxb meanc
1: hello 3 0.8265407 0.5244094
2: bye 3 0.4955301 0.6289475
3: ok 3 0.9527455 0.5600915
Run Code Online (Sandbox Code Playgroud)
另一种方法是使用.SDcols您要为其执行相同操作的列进行分组.假设您需要将列a,d,e相加type,b,g应该mean采用的位置和c,f中位数,然后,
# constructing an example data.table:
set.seed(45)
dt <- data.table(type=rep(c("hello","bye","ok"), each=3), a=sample(9),
b = rnorm(9), c=runif(9), d=sample(9), e=sample(9),
f = runif(9), g=rnorm(9))
# type a b c d e f g
# 1: hello 6 -2.5566166 0.7485015 9 6 0.5661358 -2.2066521
# 2: hello 3 1.1773119 0.6559926 3 3 0.4586280 -0.8376586
# 3: hello 2 -0.1015588 0.2164430 1 7 0.9299597 1.7216593
# 4: bye 8 -0.2260640 0.3924327 8 2 0.1271187 0.4360063
# 5: bye 7 -1.0720503 0.3256450 7 8 0.5774691 0.7571990
# 6: bye 5 -0.7131021 0.4855804 6 9 0.2687791 1.5398858
# 7: ok 1 -0.4680549 0.8476840 2 4 0.5633317 1.5393945
# 8: ok 4 0.4183264 0.4402595 4 1 0.7592801 2.1829996
# 9: ok 9 -1.4817436 0.5080116 5 5 0.2357030 -0.9953758
# 1) set key
setkey(dt, "type")
# 2) group col-ids by similar operations
id1 <- which(names(dt) %in% c("a", "d", "e"))
id2 <- which(names(dt) %in% c("b","g"))
id3 <- which(names(dt) %in% c("c","f"))
# 3) now use these ids in with .SDcols parameter
dt1 <- dt[, lapply(.SD, sum), by="type", .SDcols=id1]
dt2 <- dt[, lapply(.SD, mean), by="type", .SDcols=id2]
dt3 <- dt[, lapply(.SD, median), by="type", .SDcols=id3]
# 4) merge them.
dt1[dt2[dt3]]
# type a d e b g c f
# 1: bye 20 21 19 -0.6704055 0.9110304 0.3924327 0.2687791
# 2: hello 11 13 16 -0.4936211 -0.4408838 0.6559926 0.5661358
# 3: ok 14 11 10 -0.5104907 0.9090061 0.5080116 0.5633317
Run Code Online (Sandbox Code Playgroud)
如果/当你有很多列时,制作一个你可能很麻烦的列表.
我发现它显然eval是答案的一部分.从你的问题来看,我不清楚你是否真的想要做你想要的事情.因此我在这里证明你也可以使用一个函数:
fun <- function(a,b,c) {
list(lengtha = length(as.numeric(a)),
maxb = max(as.numeric(b)),
meanc = mean(as.numeric(c)))
}
DT[, fun(a,b,c), by=type]
type lengtha maxb meanc
1: hello 3 0.8792184 0.3745643
2: bye 3 0.8718397 0.4519999
3: ok 3 0.8900764 0.4511536
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1876 次 |
| 最近记录: |