Mar*_*ler 14 aggregate r tapply
是否可以在单个tapply或aggregate语句中包含两个函数?
下面我使用两个tapply语句和两个聚合语句:一个用于均值,一个用于SD.
我更愿意结合这些陈述.
my.Data = read.table(text = "
animal age sex weight
1 adult female 100
2 young male 75
3 adult male 90
4 adult female 95
5 young female 80
", sep = "", header = TRUE)
with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x)}))
with(my.Data, tapply(weight, list(age, sex), function(x) {sd(x) }))
with(my.Data, aggregate(weight ~ age + sex, FUN = mean)
with(my.Data, aggregate(weight ~ age + sex, FUN = sd)
# this does not work:
with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x) ; sd(x)}))
# I would also prefer that the output be formatted something similar to that
# show below. `aggregate` formats the output perfectly. I just cannot figure
# out how to implement two functions in one statement.
age sex mean sd
adult female 97.5 3.535534
adult male 90 NA
young female 80.0 NA
young male 75 NA
Run Code Online (Sandbox Code Playgroud)
我总是可以运行两个单独的语句并合并输出.我只是希望可能会有一个稍微方便的解决方案.
我在这里发现了以下答案:使用tapply将多个函数应用于列
f <- function(x) c(mean(x), sd(x))
do.call( rbind, with(my.Data, tapply(weight, list(age, sex), f)) )
Run Code Online (Sandbox Code Playgroud)
但是,行或列都没有标记.
[,1] [,2]
[1,] 97.5 3.535534
[2,] 80.0 NA
[3,] 90.0 NA
[4,] 75.0 NA
Run Code Online (Sandbox Code Playgroud)
我更喜欢基础R的解决方案.plyr
包中的解决方案发布在上面的链接中.如果我可以将正确的行和列标题添加到上面的输出中,那将是完美的.
42-*_*42- 17
但这些应该有:
with(my.Data, aggregate(weight, list(age, sex), function(x) { c(MEAN=mean(x), SD=sd(x) )}))
with(my.Data, tapply(weight, list(age, sex), function(x) { c(mean(x) , sd(x) )} ))
# Not a nice structure but the results are in there
with(my.Data, aggregate(weight ~ age + sex, FUN = function(x) c( SD = sd(x), MN= mean(x) ) ) )
age sex weight.SD weight.MN
1 adult female 3.535534 97.500000
2 young female NA 80.000000
3 adult male NA 90.000000
4 young male NA 75.
Run Code Online (Sandbox Code Playgroud)
要遵循的原则是让函数返回"一件事",它可以是向量或列表,但不能连续调用两个函数调用.
如果您想使用data.table,它已经with
并且by
内置于其中:
library(data.table)
myDT <- data.table(my.Data, key="animal")
myDT[, c("mean", "sd") := list(mean(weight), sd(weight)), by=list(age, sex)]
myDT[, list(mean_Aggr=sum(mean(weight)), sd_Aggr=sum(sd(weight))), by=list(age, sex)]
age sex mean_Aggr sd_Aggr
1: adult female 96.0 3.6055513
2: young male 76.5 2.1213203
3: adult male 91.0 1.4142136
4: young female 84.5 0.7071068
Run Code Online (Sandbox Code Playgroud)
我使用了稍微不同的数据集,以便没有NA
sd的值
本着共享的精神,如果您熟悉SQL,也可以考虑使用"sqldf"包.(强调添加,因为你需要知道,例如,这mean
是avg
为了获得你想要的结果.)
sqldf("select age, sex,
avg(weight) `Wt.Mean`,
stdev(weight) `Wt.SD`
from `my.Data`
group by age, sex")
age sex Wt.Mean Wt.SD
1 adult female 97.5 3.535534
2 adult male 90.0 0.000000
3 young female 80.0 0.000000
4 young male 75.0 0.000000
Run Code Online (Sandbox Code Playgroud)
重塑可让您传递2个功能; reshape2没有.
library(reshape)
my.Data = read.table(text = "
animal age sex weight
1 adult female 100
2 young male 75
3 adult male 90
4 adult female 95
5 young female 80
", sep = "", header = TRUE)
my.Data[,1]<- NULL
(a1<- melt(my.Data, id=c("age", "sex"), measured=c("weight")))
(cast(a1, age + sex ~ variable, c(mean, sd), fill=NA))
# age sex weight_mean weight_sd
# 1 adult female 97.5 3.535534
# 2 adult male 90.0 NA
# 3 young female 80.0 NA
# 4 young male 75.0 NA
Run Code Online (Sandbox Code Playgroud)
我欠@Ramnath,他昨天就注意到了这一点.