通过避免R中组中的第一个值来按组计算均值

SHR*_*ram 2 r function mean plyr

我有一个像这样的大数据框:

groupvar <- c("A", "A", "A", "A",  "B", "B", "B", "C",  "C", "C", "C", "D", "D", "D", "E", "E")
valuevar <- c( 1,  0.5, 0.5, 0.5,  1, 0.75, 0.75, 1, 0.8, 0.8, 0.8,    1, 0.9, 0.9,  1, 1.5)
myd <- data.frame (groupvar, valuevar)

   groupvar valuevar
1         A     1.00
2         A     0.50
3         A     0.50
4         A     0.50
5         B     1.00
6         B     0.75
7         B     0.75
8         C     1.00
9         C     0.80
10        C     0.80
11        C     0.80
12        D     1.00
13        D     0.90
14        D     0.90
15        E     1.00
16        E     1.50
Run Code Online (Sandbox Code Playgroud)

我想计算均值,但希望避免每个groupvar中第一个元素的第一个值.例如,1是给予每组中第一个值的值.例如,对于组"A",平均值将基于0.5,0.5,0.5,从而避免第一个值1.

这就是我的想法:

meanfun <- function(x)sum(x)-x[1]/ length(x)
ddply (myd,"groupvar",meanfun) 

Error in FUN(X[[1L]], ...) : 
  only defined on a data frame with all numeric variables
Run Code Online (Sandbox Code Playgroud)

Jil*_*ina 5

这可能会有所帮助

> with(myd, tapply(valuevar, groupvar, function(x) mean(x[-1])))
   A    B    C    D    E 
0.50 0.75 0.80 0.90 1.50 
Run Code Online (Sandbox Code Playgroud)

运用 aggregate

> aggregate(valuevar ~ groupvar, FUN=function(x) mean(x[-1]), data=myd)
  groupvar valuevar
1        A     0.50
2        B     0.75
3        C     0.80
4        D     0.90
5        E     1.50
Run Code Online (Sandbox Code Playgroud)

运用 ddply

> library(plyr)
> ddply (myd, "groupvar", summarize, MeanVar=mean(valuevar[-1]))
  groupvar MeanVar
1        A    0.50
2        B    0.75
3        C    0.80
4        D    0.90
5        E    1.50
Run Code Online (Sandbox Code Playgroud)