来自wide data.frames的摘要数据表

Mik*_*kko 1 r summary plyr dataframe

我试图找到data.frames从广泛创建汇总表的懒惰/简单方法data.frames.假设有一个以下data.frame,但是有更多列,因此指定列名需要很长时间:

set.seed(2)
x <- data.frame(Rep = rep(1:3, 4), Temp = c(rep(10,6), rep(20,6)), 
pH = rep(c(rep(8.1, 3), rep(7.6, 3)), 2),
Var1 = rnorm(12, 5,2), Var2 = c(rnorm(6,4,1), rnorm(6,3,5)),
Var3 = rt(12, 20))
x[1:3] <- as.data.frame(apply(x[1:3], 2, function(x) as.factor(x)))
Run Code Online (Sandbox Code Playgroud)

现在,我可以计算汇总统计信息plyr:

(mu <- ddply(x, .(Temp, pH), numcolwise(mean)))
(std <- ddply(x, .(Temp, pH), numcolwise(sd)))
(n  <- ddply(x, .(Temp, pH), numcolwise(length)))
Run Code Online (Sandbox Code Playgroud)

但我无法弄清楚如何同时应用所有这些功能:

ddply(x, .(Temp, pH), numcolwise(mean, sd, length))
Run Code Online (Sandbox Code Playgroud)

我当然可以合并各种摘要data.tables,但这不是一种"懒惰/简单"的方式.我正在寻找一些我可以在许多情况下应用的一般内容.这样的事情,除了应该可以使用单个函数生成:

xx <- merge(mu, std, by = c("Temp", "pH"), sotr = F)
colnames(xx) <- gsub("x", "mean", colnames(xx))
colnames(xx) <- gsub("y", "sd", colnames(xx))
xx <- merge(xx, n, by = c("Temp", "pH"), sotr = F)
colnames(xx)[(ncol(xx)-2):ncol(xx)] <-
paste0(colnames(xx)[(ncol(xx)-2):ncol(xx)], ".length")
xx <- xx[c("Temp", "pH", grep("Var1", colnames(xx), value = T),
grep("Var2", colnames(xx), value = T),
grep("Var3", colnames(xx), value = T))]
xx

  Temp  pH Var1.mean  Var1.sd Var1.length Var2.mean  Var2.sd Var2.length Var3.mean  Var3.sd Var3.length
1   10 7.6  4.281195 1.352194           3  3.534447 1.652884           3 0.1529616 1.076276           3
2   10 8.1  5.583853 2.491672           3  4.116622 1.478286           3 1.1611944 1.081301           3
3   20 7.6  5.840411 1.120549           3  6.907273 8.628021           3 0.1301949 1.764201           3
4   20 8.1  6.635154 2.232262           3  8.893188 4.208087           3 0.5509202 1.187431           3
Run Code Online (Sandbox Code Playgroud)

目前可以在R中做到吗?任何建议将不胜感激.

jub*_*uba 5

reshape2和做的一种方法plyr.但是您可以在行而不是列中获得变量的结果:

library(reshape2)
library(plyr)
md <- melt(x[,-1], id.vars=c("Temp","pH"))
ddply(md, c("Temp", "pH", "variable"), summarize, mean=mean(value), sd=sd(value))
Run Code Online (Sandbox Code Playgroud)

这使 :

   Temp  pH variable      mean       sd
1    10 7.6     Var1 4.2811952 1.352194
2    10 7.6     Var2 3.5344474 1.652884
3    10 7.6     Var3 0.1529616 1.076276
4    10 8.1     Var1 5.5838533 2.491672
5    10 8.1     Var2 4.1166215 1.478286
6    10 8.1     Var3 1.1611944 1.081301
7    20 7.6     Var1 5.8404110 1.120549
8    20 7.6     Var2 6.9072734 8.628021
9    20 7.6     Var3 0.1301949 1.764201
10   20 8.1     Var1 6.6351538 2.232262
11   20 8.1     Var2 8.8931884 4.208087
12   20 8.1     Var3 0.5509202 1.187431
Run Code Online (Sandbox Code Playgroud)

如果您希望结果的格式较宽,可以使用reshape:

md <- melt(x[,-1], id.vars=c("Temp","pH"))
result <- ddply(md, c("Temp", "pH", "variable"), summarize, mean=mean(value), sd=sd(value))
reshape(result, idvar=c("Temp","pH"), timevar="variable",direction="wide")

   Temp  pH mean.Var1  sd.Var1 mean.Var2  sd.Var2 mean.Var3  sd.Var3
1    10 7.6  4.281195 1.352194  3.534447 1.652884 0.1529616 1.076276
4    10 8.1  5.583853 2.491672  4.116622 1.478286 1.1611944 1.081301
7    20 7.6  5.840411 1.120549  6.907273 8.628021 0.1301949 1.764201
10   20 8.1  6.635154 2.232262  8.893188 4.208087 0.5509202 1.187431
Run Code Online (Sandbox Code Playgroud)