单个tapply或聚合语句中的多个函数

Mar*_*ler 14 aggregate r tapply

是否可以在单个tapply或aggregate语句中包含两个函数?

下面我使用两个tapply语句和两个聚合语句:一个用于均值,一个用于SD.
我更愿意结合这些陈述.

my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x)}))
with(my.Data, tapply(weight, list(age, sex), function(x) {sd(x)  }))

with(my.Data, aggregate(weight ~ age + sex, FUN = mean)
with(my.Data, aggregate(weight ~ age + sex, FUN =   sd)

# this does not work:

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x) ; sd(x)}))

# I would also prefer that the output be formatted something similar to that 
# show below.  `aggregate` formats the output perfectly.  I just cannot figure 
# out how to implement two functions in one statement.

  age    sex   mean        sd
adult female   97.5  3.535534
adult   male     90        NA
young female   80.0        NA
young   male     75        NA
Run Code Online (Sandbox Code Playgroud)

我总是可以运行两个单独的语句并合并输出.我只是希望可能会有一个稍微方便的解决方案.

我在这里发现了以下答案:使用tapply将多个函数应用于列

f <- function(x) c(mean(x), sd(x))
do.call( rbind, with(my.Data, tapply(weight, list(age, sex), f)) )
Run Code Online (Sandbox Code Playgroud)

但是,行或列都没有标记.

     [,1]     [,2]
[1,] 97.5 3.535534
[2,] 80.0       NA
[3,] 90.0       NA
[4,] 75.0       NA
Run Code Online (Sandbox Code Playgroud)

我更喜欢基础R的解决方案.plyr包中的解决方案发布在上面的链接中.如果我可以将正确的行和列标题添加到上面的输出中,那将是完美的.

42-*_*42- 17

但这些应该有:

with(my.Data, aggregate(weight, list(age, sex), function(x) { c(MEAN=mean(x), SD=sd(x) )}))

with(my.Data, tapply(weight, list(age, sex), function(x) { c(mean(x) , sd(x) )} ))
# Not a nice structure but the results are in there

with(my.Data, aggregate(weight ~ age + sex, FUN =  function(x) c( SD = sd(x), MN= mean(x) ) ) )
    age    sex weight.SD weight.MN
1 adult female  3.535534 97.500000
2 young female        NA 80.000000
3 adult   male        NA 90.000000
4 young   male        NA 75.
Run Code Online (Sandbox Code Playgroud)

要遵循的原则是让函数返回"一件事",它可以是向量或列表,但不能连续调用两个函数调用.

  • 这里的结果是矩阵作为第三列的数据帧中的列.通过将整个事物包装在`do.call(data.frame,...)`中轻松解决.+1 (2认同)

Ric*_*rta 9

如果您想使用data.table,它已经with并且by内置于其中:

library(data.table)
myDT <- data.table(my.Data, key="animal")


myDT[, c("mean", "sd") := list(mean(weight), sd(weight)), by=list(age, sex)]


myDT[, list(mean_Aggr=sum(mean(weight)), sd_Aggr=sum(sd(weight))), by=list(age, sex)]
     age    sex mean_Aggr   sd_Aggr
1: adult female     96.0  3.6055513
2: young   male     76.5  2.1213203
3: adult   male     91.0  1.4142136
4: young female     84.5  0.7071068
Run Code Online (Sandbox Code Playgroud)

我使用了稍微不同的数据集,以便没有NAsd的值


A5C*_*2T1 7

本着共享的精神,如果您熟悉SQL,也可以考虑使用"sqldf"包.(强调添加,因为你需要知道,例如,这meanavg为了获得你想要的结果.)

sqldf("select age, sex, 
      avg(weight) `Wt.Mean`, 
      stdev(weight) `Wt.SD` 
      from `my.Data` 
      group by age, sex")
    age    sex Wt.Mean    Wt.SD
1 adult female    97.5 3.535534
2 adult   male    90.0 0.000000
3 young female    80.0 0.000000
4 young   male    75.0 0.000000
Run Code Online (Sandbox Code Playgroud)


Jac*_*yan 5

重塑可让您传递2个功能; reshape2没有.

library(reshape)
my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)
my.Data[,1]<- NULL
(a1<-  melt(my.Data, id=c("age", "sex"), measured=c("weight")))
(cast(a1, age + sex ~ variable, c(mean, sd), fill=NA))

#     age    sex weight_mean weight_sd
# 1 adult female        97.5  3.535534
# 2 adult   male        90.0        NA
# 3 young female        80.0        NA
# 4 young   male        75.0        NA
Run Code Online (Sandbox Code Playgroud)

我欠@Ramnath,他昨天就注意到了这一点.