我有一个包含 5 列的数据框。我知道如何计算由另一列分组的一列的平均值。但是,我需要将其按两列分组。例如,我想计算按第 1 列和第 2 列分组的第 5 列的平均值。
\n\ndf <- structure(list(Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, \n1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, \n3L, 3L, 3L), .Label = c("AT", "CH", "DE"), class = "factor"), \n Occupation = c(1L, 3L, 5L, 3L, 1L, 2L, 5L, 3L, 5L, 3L, 1L, \n 2L, 1L, 5L, 3L, 3L, 1L, 3L, 2L, 5L, 5L, 1L, 2L, 1L, 3L), \n Age = c(20L, 46L, 30L, 12L, 73L, 53L, 19L, 43L, 65L, 53L, \n 19L, 34L, 76L, 25L, 45L, 39L, 18L, 59L, 37L, 24L, 19L, 60L, \n 51L, 32L, 29L), Gender = structure(c(1L, 1L, 2L, 2L, 2L, \n 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, \n 2L, 2L, 1L, 1L, 2L), .Label = c("female", "male"), class = "factor"), \n Income = c(100L, 80L, 78L, 29L, 156L, 56L, 95L, 104L, 87L, \n 56L, 203L, 45L, 112L, 78L, 56L, 140L, 99L, 67L, 89L, 109L, \n 43L, 145L, 30L, 101L, 77L)), class = "data.frame", row.names = c(NA, \n-25L))\n\nhead(df)\n\n Country Occupation Age Gender Income\n1 AT 1 20 female 100\n2 AT 3 46 female 80\n3 AT 5 30 male 78\n4 AT 3 12 male 29\n5 AT 1 73 male 156\n6 AT 2 53 female 56\nRun Code Online (Sandbox Code Playgroud)\n\n所以我想要的是计算列 \xe2\x80\x98venue\xe2\x80\x99 的平均值,按国家和职业分组。例如,我想计算居住在国家 \xe2\x80\x98AT\xe2\x80\x99 且职业为 \xe2\x80\x983\xe2 的所有人员的 \xe2\x80\x98venue\xe2\x80\x99 的平均值\x80\x99,所有居住在国家 \xe2\x80\x98CH\xe2\x80\x99 且职业为 \xe2\x80\x981\xe2\x80 的人的 \xe2\x80\x98 收入\xe2\x80\x99 的平均值\x99 等等。
\n(1) 基础方法(聚合)
mean.df <- aggregate(Income ~ Country + Occupation, df, mean)
names(mean.df)[3] <- "Income_Mean"
merge(df, mean.df)
Run Code Online (Sandbox Code Playgroud)
(2) 基本方法(tapply)
mean.df1 <- tapply(df$Income, list(df$Country, df$Occupation), mean)
mean.df2 <- as.data.frame(as.table(mean.df1))
names(mean.df2) <- c("Country", "Occupation", "Income_Mean")
merge(df, mean.df2)
Run Code Online (Sandbox Code Playgroud)
(3)统计方法(ave)
df2 <- df
df2$Income_Mean <- ave(df$Income, df$Country, df$Occupation)
Run Code Online (Sandbox Code Playgroud)
(4) dplyr方法
df %>% group_by(Country, Occupation) %>%
mutate(Income_Mean = mean(Income))
Run Code Online (Sandbox Code Playgroud)
输出 :
Country Occupation Age Gender Income Income_Mean
<fct> <int> <int> <fct> <int> <dbl>
1 AT 1 20 female 100 128
2 AT 3 46 female 80 71
3 AT 5 30 male 78 86.5
4 AT 3 12 male 29 71
5 AT 1 73 male 156 128
6 AT 2 53 female 56 56
7 AT 5 19 male 95 86.5
8 AT 3 43 male 104 71
9 CH 5 65 male 87 82.5
10 CH 3 53 female 56 84
# ... with 15 more rows
Run Code Online (Sandbox Code Playgroud)