计算按其他两列的值分组的列的平均值

Sea*_*ess 0 group-by r

我有一个包含 5 列的数据框。我知道如何计算由另一列分组的一列的平均值。但是,我需要将其按两列分组。例如,我想计算按第 1 列和第 2 列分组的第 5 列的平均值。

\n\n
df <- structure(list(Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, \n1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, \n3L, 3L, 3L), .Label = c("AT", "CH", "DE"), class = "factor"), \n    Occupation = c(1L, 3L, 5L, 3L, 1L, 2L, 5L, 3L, 5L, 3L, 1L, \n    2L, 1L, 5L, 3L, 3L, 1L, 3L, 2L, 5L, 5L, 1L, 2L, 1L, 3L), \n    Age = c(20L, 46L, 30L, 12L, 73L, 53L, 19L, 43L, 65L, 53L, \n    19L, 34L, 76L, 25L, 45L, 39L, 18L, 59L, 37L, 24L, 19L, 60L, \n    51L, 32L, 29L), Gender = structure(c(1L, 1L, 2L, 2L, 2L, \n    1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, \n    2L, 2L, 1L, 1L, 2L), .Label = c("female", "male"), class = "factor"), \n    Income = c(100L, 80L, 78L, 29L, 156L, 56L, 95L, 104L, 87L, \n    56L, 203L, 45L, 112L, 78L, 56L, 140L, 99L, 67L, 89L, 109L, \n    43L, 145L, 30L, 101L, 77L)), class = "data.frame", row.names = c(NA, \n-25L))\n\nhead(df)\n\n  Country Occupation Age Gender Income\n1      AT          1  20 female    100\n2      AT          3  46 female     80\n3      AT          5  30   male     78\n4      AT          3  12   male     29\n5      AT          1  73   male    156\n6      AT          2  53 female     56\n
Run Code Online (Sandbox Code Playgroud)\n\n

所以我想要的是计算列 \xe2\x80\x98venue\xe2\x80\x99 的平均值,按国家和职业分组。例如,我想计算居住在国家 \xe2\x80\x98AT\xe2\x80\x99 且职业为 \xe2\x80\x983\xe2 的所有人员的 \xe2\x80\x98venue\xe2\x80\x99 的平均值\x80\x99,所有居住在国家 \xe2\x80\x98CH\xe2\x80\x99 且职业为 \xe2\x80\x981\xe2\x80 的人的 \xe2\x80\x98 收入\xe2\x80\x99 的平均值\x99 等等。

\n

Dar*_*sai 6

(1) 基础方法(聚合)

mean.df <- aggregate(Income ~ Country + Occupation, df, mean)
names(mean.df)[3] <- "Income_Mean"
merge(df, mean.df)
Run Code Online (Sandbox Code Playgroud)

(2) 基本方法(tapply)

mean.df1 <- tapply(df$Income, list(df$Country, df$Occupation), mean)
mean.df2 <- as.data.frame(as.table(mean.df1))
names(mean.df2) <- c("Country", "Occupation", "Income_Mean")
merge(df, mean.df2)
Run Code Online (Sandbox Code Playgroud)

(3)统计方法(ave)

df2 <- df
df2$Income_Mean <- ave(df$Income, df$Country, df$Occupation)
Run Code Online (Sandbox Code Playgroud)

(4) dplyr方法

df %>% group_by(Country, Occupation) %>%
       mutate(Income_Mean = mean(Income))
Run Code Online (Sandbox Code Playgroud)

输出 :

   Country Occupation   Age Gender Income Income_Mean
   <fct>        <int> <int> <fct>   <int>       <dbl>
 1 AT               1    20 female    100       128  
 2 AT               3    46 female     80        71  
 3 AT               5    30 male       78        86.5
 4 AT               3    12 male       29        71  
 5 AT               1    73 male      156       128  
 6 AT               2    53 female     56        56  
 7 AT               5    19 male       95        86.5
 8 AT               3    43 male      104        71  
 9 CH               5    65 male       87        82.5
10 CH               3    53 female     56        84
# ... with 15 more rows
Run Code Online (Sandbox Code Playgroud)