我有一个与此结构类似的数据框
Year <- c("2000", "2001", "2002" ,"2003", "2004", "2005" ,"2006", "2007", "2008", "2009", "2010", "2011" ,"2012", "2013", "2014", "2015")
Sales <- c(2000,4800,6700,5000,7000,8000,3070,2000,1800,7100,6600,5000,6000,4200,1200,5700)
salesDF <- data.frame(Year,Sales)
Run Code Online (Sandbox Code Playgroud)
该Year
列是一个因子变量。我想改变一个新列,该列在 Year 列中具有观察值,以 5 年为间隔分组。因此,最终,销售趋势是 5 年间隔的倍数。
我希望我的传说有间隔 "2000", "2005", "2010", "2015"
我该如何实现这一目标?
这是使用cumsum
和模数 ( %%
)分组的简单方法:
salesDF %>%
mutate(Group = cumsum(as.numeric(as.character(salesDF$Year)) %% 5 == 0)) %>%
group_by(Group) %>%
summarize(Year = first(Year), Mean = mean(Sales), Sum = sum(Sales))
# A tibble: 4 x 4
Group Year Mean Sum
<int> <fct> <dbl> <dbl>
1 1 2000 5100 25500
2 2 2005 4394 21970
3 3 2010 4600 23000
4 4 2015 5700 5700
Run Code Online (Sandbox Code Playgroud)
或者作为一个没有总结的新列:
salesDF %>%
mutate(Group = cumsum(as.numeric(as.character(salesDF$Year)) %% 5 == 0)) %>%
group_by(Group) %>%
mutate(Mean = mean(Sales), Sum = sum(Sales))
# A tibble: 16 x 5
# Groups: Group [4]
Year Sales Group Mean Sum
<fct> <dbl> <int> <dbl> <dbl>
1 2000 2000 1 5100 25500
2 2001 4800 1 5100 25500
3 2002 6700 1 5100 25500
...
14 2013 4200 3 4600 23000
15 2014 1200 3 4600 23000
16 2015 5700 4 5700 5700
Run Code Online (Sandbox Code Playgroud)