我有一个看起来像这样的data.frame
> head(df)
Memory Memory Memory Memory Memory Naive Naive
10472501 6.075714 5.898929 6.644946 6.023901 6.332126 8.087944 7.520194
10509163 6.168941 6.495393 5.951124 6.052527 6.404401 7.152890 8.335509
10496091 10.125575 9.966211 10.075613 10.310952 10.090649 11.803949 11.274480
10427035 6.644921 6.658567 6.569745 6.499243 6.990852 8.010784 7.798154
10503695 8.379494 8.153917 8.246484 8.390747 8.346748 9.540236 9.091740
10451763 10.986717 11.233819 10.643245 10.230697 10.541396 12.248487 11.823138
Run Code Online (Sandbox Code Playgroud)
我想找到Memory列的平均值和Naive列的平均值.该aggregate函数聚合行.这data.frame可能会有大量的行,因此aggregate通过colnames原始应用的转置使data.frame我感觉不好,并且通常很烦人:
> head(t(aggregate(t(df),list(colnames(df)), mean)))
[,1] [,2]
Group.1 "Memory" "Naive"
10472501 "6.195123" "8.125439"
10509163 "6.214477" "7.733625"
10496091 "10.11380" "11.55348"
10427035 "6.672665" "8.266854"
10503695 "8.303478" "9.340436"
Run Code Online (Sandbox Code Playgroud)
我错过了一件令人眼花缭乱的明显事情?
我是重新格式化数据的主要倡导者,因此它采用"长"格式.当涉及到像这样的问题时,长格式的效用尤其明显.幸运的是,将这样的数据重塑成几乎任何格式的reshape包很容易.
如果我理解你的问题吧,你想要的平均值Memory,并Naive为每一行.无论出于何种原因,我们需要使列名称唯一reshape::melt().
colnames(df) <- paste(colnames(df), 1:ncol(df), sep = "_")
Run Code Online (Sandbox Code Playgroud)
然后,您将必须创建一个ID列.你可以做到
df$ID <- 1:nrow(df)
Run Code Online (Sandbox Code Playgroud)
或者,如果这些rownames是有意义的
df$ID <- rownames(df)
Run Code Online (Sandbox Code Playgroud)
现在,随着reshape包
library(reshape)
df.m <- melt(df, id = "ID")
df.m <- cbind(df.m, colsplit(df.m$variable, split = "_", names = c("Measure", "N")))
df.agg <- cast(df.m, ID ~ Measure, fun = mean)
Run Code Online (Sandbox Code Playgroud)
df.agg 现在应该看起来像你想要的输出snippit.
或者,如果你只想要所有行的整体意义,Zack的建议将起作用.就像是
m <- colMeans(df)
tapply(m, colnames(df), mean)
Run Code Online (Sandbox Code Playgroud)
您可以获得相同的结果,但格式化为数据框
cast(df.m, .~variable, fun = mean)
Run Code Online (Sandbox Code Playgroud)