dplyr,lubridate:如何按周汇总数据框?

ℕʘʘ*_*ḆḽḘ 11 r xts lubridate dplyr

请考虑以下示例

library(tidyverse)
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = wday(time, label = TRUE))

Source: local data frame [25 x 3]

         time values day_of_week
       <date>  <dbl>      <fctr>
1  2014-02-24     30         Mon
2  2014-02-25     45        Tues
3  2014-02-26     30         Wed
4  2014-02-27     50       Thurs
5  2014-02-28     50         Fri
6  2014-03-01     20         Sat
7  2014-03-02     35         Sun
8  2014-03-03     50         Mon
9  2014-03-04     35        Tues
10 2014-03-05     35         Wed
Run Code Online (Sandbox Code Playgroud)

我希望按周汇总这个数据框.

也就是说,假设我将一周定义为从星期一早上开始到星期日晚上结束,我们将其称为Monday to Monday循环.(重要的是,我希望能够选择其他约定,例如周五到周五).

然后,我只想计算values每周的平均值.

例如,在上面的示例中,可以计算values2月24日星期一到3月2日星期日之间的平均值,依此类推.

我怎样才能做到这一点?

谢谢!

编辑:感谢所有提出想法的人.有点不寻常,我认为我的后期解决方案可能更合适.再次感谢!

ali*_*ire 23

在tidyverse,

df2 %>% group_by(week = week(time)) %>% summarise(value = mean(values))

## # A tibble: 5 × 2
##    week    value
##   <dbl>    <dbl>
## 1     8 37.50000
## 2     9 38.57143
## 3    10 38.57143
## 4    11 36.42857
## 5    12 45.00000
Run Code Online (Sandbox Code Playgroud)

或者isoweek改为使用:

df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##    week    value
##   <int>    <dbl>
## 1     9 37.14286
## 2    10 40.71429
## 3    11 35.00000
## 4    12 42.50000
Run Code Online (Sandbox Code Playgroud)

或者cut.Date:

df2 %>% group_by(week = cut(time, "week")) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-24 37.14286
## 2 2014-03-03 40.71429
## 3 2014-03-10 35.00000
## 4 2014-03-17 42.50000
Run Code Online (Sandbox Code Playgroud)

如果您愿意,可以告诉您在星期天开始:

df2 %>% group_by(week = cut(time, "week", start.on.monday = FALSE)) %>% 
    summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-23 37.50000
## 2 2014-03-02 40.00000
## 3 2014-03-09 33.57143
## 4 2014-03-16 44.00000
Run Code Online (Sandbox Code Playgroud)

如果您想转到星期二开始,请在您的日期添加一个:

df2 %>% group_by(week = cut(time + 1, "week")) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-24 37.50000
## 2 2014-03-03 40.00000
## 3 2014-03-10 33.57143
## 4 2014-03-17 44.00000
Run Code Online (Sandbox Code Playgroud)

不过,标签将会关闭.如果使用cut,请考虑其include.lowestright参数的含义,记录在?cut.


vag*_*ond 5

为什么不直接使用floor_date和整数来调整一周的开始日期?

library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")

set.seed(123)

values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)  
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = weekdays(time))

# week wednesday to tuesday
df2 %>% group_by(Week = floor_date(time-3, unit="week")) %>% 
  summarize(WeeklyAveDist=mean(values), mean(values), min_date = min(time), max_date = max(time)) %>% mutate(weekdays(min_date), weekdays(max_date)))

        Week WeeklyAveDist mean.values.   min_date   max_date
1 2014-02-16      37.50000     37.50000 2014-02-24 2014-02-25
2 2014-02-23      38.57143     38.57143 2014-02-26 2014-03-04
3 2014-03-02      38.57143     38.57143 2014-03-05 2014-03-11
4 2014-03-09      36.42857     36.42857 2014-03-12 2014-03-18
5 2014-03-16      45.00000     45.00000 2014-03-19 2014-03-20
  weekdays.min_date. weekdays.max_date.
1             Monday            Tuesday
2          Wednesday            Tuesday
3          Wednesday            Tuesday
4          Wednesday            Tuesday
5          Wednesday           Thursday


# Week Thursday to Wednesday
df2 %>% group_by(Week = floor_date(time-4, unit="week")) %>% 
  summarize(WeeklyAveDist=mean(values), mean(values), min_date = min(time), max_date = max(time)) %>% mutate(weekdays(min_date), weekdays(max_date)))

        Week WeeklyAveDist mean.values.   min_date   max_date
1 2014-02-16      35.00000     35.00000 2014-02-24 2014-02-26
2 2014-02-23      39.28571     39.28571 2014-02-27 2014-03-05
3 2014-03-02      37.14286     37.14286 2014-03-06 2014-03-12
4 2014-03-09      40.00000     40.00000 2014-03-13 2014-03-19
5 2014-03-16      40.00000     40.00000 2014-03-20 2014-03-20
  weekdays.min_date. weekdays.max_date.
1             Monday          Wednesday
2           Thursday          Wednesday
3           Thursday          Wednesday
4           Thursday          Wednesday
5           Thursday           Thursday
Run Code Online (Sandbox Code Playgroud)


ℕʘʘ*_*ḆḽḘ 1

就这一次,经过一些研究,我实际上认为我想出了一个更好的解决方案

\n\n
    \n
  • 给出正确的聚合
  • \n
  • 给出正确的标签
  • \n
\n\n

下面的示例是从星期四开始的几周。这些周将按照给定周期的第一天进行标记。

\n\n
library(tidyverse)\nlibrary(lubridate)\noptions(tibble.print_min = 30)\n\ntime <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")\nset.seed(123)\nvalues <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)\ndf2 <- data_frame(time, values)\n\ndf2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),\n                      day_of_week = wday(time, label = FALSE))\n\ndf2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),\n                      tmp_1 = (as.integer(day_of_week) - 5),\n                      tmp_2 = ((as.integer(day_of_week) - 5) %% 7))\n
Run Code Online (Sandbox Code Playgroud)\n\n

这使

\n\n
> df2\n# A tibble: 25 \xc3\x97 7\n         time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2\n       <date>  <dbl>             <ord>       <dbl>         <date> <dbl> <dbl>\n1  2014-02-24     30               Mon           2     2014-02-20    -3     4\n2  2014-02-25     45              Tues           3     2014-02-20    -2     5\n3  2014-02-26     30               Wed           4     2014-02-20    -1     6\n4  2014-02-27     50             Thurs           5     2014-02-27     0     0\n5  2014-02-28     50               Fri           6     2014-02-27     1     1\n6  2014-03-01     20               Sat           7     2014-02-27     2     2\n7  2014-03-02     35               Sun           1     2014-02-27    -4     3\n8  2014-03-03     50               Mon           2     2014-02-27    -3     4\n9  2014-03-04     35              Tues           3     2014-02-27    -2     5\n10 2014-03-05     35               Wed           4     2014-02-27    -1     6\n11 2014-03-06     50             Thurs           5     2014-03-06     0     0\n12 2014-03-07     35               Fri           6     2014-03-06     1     1\n13 2014-03-08     40               Sat           7     2014-03-06     2     2\n14 2014-03-09     40               Sun           1     2014-03-06    -4     3\n15 2014-03-10     20               Mon           2     2014-03-06    -3     4\n16 2014-03-11     50              Tues           3     2014-03-06    -2     5\n17 2014-03-12     25               Wed           4     2014-03-06    -1     6\n18 2014-03-13     20             Thurs           5     2014-03-13     0     0\n19 2014-03-14     30               Fri           6     2014-03-13     1     1\n20 2014-03-15     50               Sat           7     2014-03-13     2     2\n21 2014-03-16     50               Sun           1     2014-03-13    -4     3\n22 2014-03-17     40               Mon           2     2014-03-13    -3     4\n23 2014-03-18     40              Tues           3     2014-03-13    -2     5\n24 2014-03-19     50               Wed           4     2014-03-13    -1     6\n25 2014-03-20     40             Thurs           5     2014-03-20     0     0\n
Run Code Online (Sandbox Code Playgroud)\n