如何从具有x个可能组的矩阵中提取平均值

Ada*_*dam 1 r

假设我有以下示例数据集:

df1 = 
ID    Group_Type    Units
 1       A           10
 2       A           12
 3       A           17
 4       B            6
 5       B            9
 6       D           23
 7       D           16
 8       D           21
 9       G           40
10       G           31
Run Code Online (Sandbox Code Playgroud)

组类型可能是英语字母表中A和Z之间的任何字母.有没有办法同时检测A,B,D和G组(或任何现有组),然后平均每组的单位,并将整个结果分配给矩阵?我认为它看起来像这样:

      [,1]
[1,]   13
[2,]   7.5
[3,]   20
[4,]   35
Run Code Online (Sandbox Code Playgroud)

([1,] = A,等等...... [,1] =每组平均值)

我知道如何单独完成这些任务,但我不知道如何将它组合成一段易于管理的代码.我最近使用table,unlist和grep来挑选数据框中的单词,但我无法想象过去.

akr*_*run 6

假设'df1'是'data.frame'.如果是'矩阵'(我怀疑)

df1 <- as.data.frame(df1, stringsAsFactors=FALSE)
df1$Units <- as.numeric(df1$Units)
Run Code Online (Sandbox Code Playgroud)

运用 dplyr

library(dplyr)
df1 %>% 
   group_by(Group_Type) %>%
   summarise(Units=mean(Units))
#    Group_Type Units
#1          A  13.0
#2          B   7.5
#3          D  20.0
#4          G  35.5
Run Code Online (Sandbox Code Playgroud)

或使用 base R

aggregate(Units~Group_Type, df1, FUN=mean, na.action=NULL)
#  Group_Type Units
#1          A  13.0
#2          B   7.5
#3          D  20.0
#4          G  35.5
Run Code Online (Sandbox Code Playgroud)

要么 data.table

library(data.table)
setDT(df1)[, list(Units=mean(Units)), Group_Type]
#    Group_Type Units
#1:          A  13.0
#2:          B   7.5
#3:          D  20.0
#4:          G  35.5
Run Code Online (Sandbox Code Playgroud)

对于dplyr,data.tableaggregate,您可以使用该选项从计算中na.rm=TRUE删除NAmean.即mean(Units, na.rm=TRUE)dplyr/data.table...,FUN=mean, na.rm=TRUE, na.action=NULL)aggregate

或者sqldf.在avg将删除NA/null默认值

library(sqldf)
sqldf('select Group_Type,
        avg(Units) as Units 
        from df1 
        group by Group_Type',
        method = "raw")
 #   Group_Type Units
 #1          A  13.0
 #2          B   7.5
 #3          D  20.0
 #4          G  35.5
Run Code Online (Sandbox Code Playgroud)

假设if是'Group_Type'的'Units'中的单个缺失值,并希望输出为NA.

 df1$Units[3] <- NA
 sqldf('select Group_Type,
           case when count(Units) = count(*) 
                then avg(Units) 
                else null 
                end as Units
           from df1 
           group by Group_Type',
           method="raw")
 #   Group_Type Units
 #1          A  <NA>
 #2          B   7.5
 #3          D  20.0
 #4          G  35.5
Run Code Online (Sandbox Code Playgroud)