我试图计算变量(在这种情况下是国家)在任何给定年份出现的频率.例如:
name <- c('AJ Griffin','Steve Bacon','Kevin Potatoe','Jose Hernandez','Kent Brockman',
'Sal Fasno','Kirk Kelly','Wes United','Livan Domingo','Mike Fast')
country <- c('USA', 'USA', 'Canada', 'Dominican Republic', 'Panama', 'Dominican Republic', 'Canada', 'USA', 'Dominican Republic', 'Mexico')
year <- c('2016', '2016', '2016', '2016', '2016', '2015', '2015', '2015', '2015', '2015')
country_analysis <-data.frame(name, country, year)
Run Code Online (Sandbox Code Playgroud)
当我使用下面的代码时,我得到整个数据集的国家比例,但我想进一步削减到特定年份.
P <- country_analysis %>%
group_by(country) %>%
summarise(n=n())%>%
mutate(freq = round(n / sum(n), 1))
Run Code Online (Sandbox Code Playgroud)
理想情况下,最终结果将包含国家,年份,频率列(即2016年,美国,0.4).任何输入将不胜感激.
首先是按年份和国家崩溃,然后是一年.例如
country_analysis %>%
group_by(year, country) %>%
summarize(count=n()) %>%
mutate(proportion=count/sum(count))
# year country count proportion
# <fctr> <fctr> <int> <dbl>
# 1 2015 Canada 1 0.2
# 2 2015 Dominican Republic 2 0.4
# 3 2015 Mexico 1 0.2
# 4 2015 USA 1 0.2
# 5 2016 Canada 1 0.2
# 6 2016 Dominican Republic 1 0.2
# 7 2016 Panama 1 0.2
# 8 2016 USA 2 0.4
Run Code Online (Sandbox Code Playgroud)