我正在使用 dplyr 包生成一些表,并且正在使用该adorn_totals("row")函数。
当我想要对组内的值求和时,这种方法效果很好,但在某些情况下,我想要总体平均值而不是总和。有 adorn_means 函数吗?
示例代码:
Regions2 <- Data %>%
filter(!is.na(REGION))%>%
group_by(REGION) %>%
summarise(Numberofpeople=length(Names))%>%
adorn_totals("row")
Run Code Online (Sandbox Code Playgroud)
这里我的“总计”行只是该地区内所有人的总和。这给了我
REGION NumberofPeople
East Midlands 578,943
East of England 682,917
London 1,247,540
North East 245,830
North West 742,886
South East 963,040
South West 623,684
West Midlands 653,335
Yorkshire 553,853
TOTAL 6,292,028
Run Code Online (Sandbox Code Playgroud)
我的下一段代码生成每个地区的平均工资,但我想添加总的总体平均工资
Regions3 <- Data %>%
filter(!is.na(REGION))%>%
filter(!is.na(AVGSalary))%>%
group_by(REGION) %>%
summarise(AverageSalary=mean(AVGSalary))
Run Code Online (Sandbox Code Playgroud)
如果我像以前一样使用,adnorn_totals("row")我只会得到平均值的总和,而不是数据集的总体平均值。
我如何获得总体平均值?
更新一些 noddy 数据:
数据
people region salary
person1 London 1000
person2 South West 1050
person3 South East 900
person4 London 800
person5 Scotland 1020
person6 South West 750
person7 East 600
person8 London 1200
person9 South West 1150
Run Code Online (Sandbox Code Playgroud)
因此,组平均值为:
London 1000
South West 983.33
South East 900
Scotland 1020
East 600
Run Code Online (Sandbox Code Playgroud)
我想将总数添加到底部
Total 941.11
Run Code Online (Sandbox Code Playgroud)
1)因为总体平均值是平均值的加权平均值(而不是平均值的简单平均值),即它是 941 而不是 901,所以我们维护一列,n以便最终我们可以正确计算总体平均值。尽管显示的数据没有任何我们使用的 NA,drop_na以便将其与此类数据一起使用。这将删除任何包含 NA 的行。
library(dplyr)
library(tidyr)
Region %>%
drop_na %>%
group_by(region) %>%
summarize(avg = mean(salary), n = n()) %>%
ungroup %>%
bind_rows(summarize(., region = "Overall Avg",
avg = sum(avg * n) / sum(n),
n = sum(n))) %>%
select(-n)
Run Code Online (Sandbox Code Playgroud)
给予:
# A tibble: 6 x 2
region avg
<chr> <dbl>
1 East 600
2 London 1000
3 Scotland 1020
4 South East 900
5 South West 983.
6 Overall Avg 941.
Run Code Online (Sandbox Code Playgroud)
2)另一种方法是通过返回原始数据来构建总体平均线:
Region %>%
drop_na %>%
group_by(region) %>%
summarize(avg = mean(salary)) %>%
ungroup %>%
bind_rows(summarize(Region %>% drop_na, region = "Overall Avg", avg = mean(salary)))
Run Code Online (Sandbox Code Playgroud)
给予:
# A tibble: 6 x 2
region avg
<chr> <dbl>
1 East 600
2 London 1000
3 Scotland 1020
4 South East 900
5 South West 983.
6 Overall Avg 941.
Run Code Online (Sandbox Code Playgroud)
2a)如果您反对提及Region两次,请尝试此操作。
Region_ <- Region %>%
drop_na
Region_ %>%
group_by(region) %>%
summarize(avg = mean(salary)) %>%
ungroup %>%
bind_rows(summarize(Region_, region = "Overall Avg", avg = mean(salary)))
Run Code Online (Sandbox Code Playgroud)
2b)或作为单个管道,现在Region_位于管道本地,并将在管道完成后自动删除:
Region %>%
drop_na %>%
{ Region_ <- .
Region_ %>%
group_by(region) %>%
summarize(avg = mean(salary)) %>%
ungroup %>%
bind_rows(summarize(Region_, region = "Overall Avg", avg = mean(salary)))
}
Run Code Online (Sandbox Code Playgroud)
我们用它作为输入:
Lines <- "people region salary
person1 London 1000
person2 South West 1050
person3 South East 900
person4 London 800
person5 Scotland 1020
person6 South West 750
person7 East 600
person8 London 1200
person9 South West 1150"
library(gsubfn)
Region <- read.pattern(text = Lines, pattern = "^(\\S+) +(.*) (\\d+)$",
as.is = TRUE, skip = 1, strip.white = TRUE,
col.names = read.table(text = Lines, nrow = 1, as.is = TRUE))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1812 次 |
| 最近记录: |