我有一个数据集,df:(数据集包含 4000 多行)
DATEB
9/9/2019 7:51:58 PM
9/9/2019 7:51:59 PM
9/9/2019 7:51:59 PM
9/9/2019 7:52:00 PM
9/9/2019 7:52:01 PM
9/9/2019 7:52:01 PM
9/9/2019 7:52:02 PM
9/9/2019 7:52:03 PM
9/9/2019 7:54:00 PM
9/9/2019 7:54:02 PM
9/10/2019 8:00:00PM
Run Code Online (Sandbox Code Playgroud)
如果日期时间之间的时间超过 120 秒,我想将它们放在不同的组中,并获取持续时间。
期望的输出:
Group Duration
a 5 sec
b 2 sec
c 0 sec
dput:
structure(list(DATEB = structure(c(2L, 3L, 3L, 4L, 5L, 5L, 6L,
7L, 8L, 9L, 1L), .Label = c(" 9/10/2019 8:00:00 PM", " 9/9/2019 7:51:58 PM",
" 9/9/2019 7:51:59 PM", " 9/9/2019 7:52:00 PM", " 9/9/2019 7:52:01 PM",
" 9/9/2019 7:52:02 PM", " 9/9/2019 7:52:03 PM", " 9/9/2019 7:54:00 PM",
" 9/9/2019 7:54:02 PM"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))
Run Code Online (Sandbox Code Playgroud)
我已经尝试了下面的代码,效果很好,但我希望 7:51:59 和 7:52:00 在同一组中。持续时间应该中断并创建新组的唯一时间是日期时间之间的时间超过 120 秒。
df %>%
mutate(DATEB = lubridate::mdy_hms(DATEB),
temp = floor_date(DATEB, "120 secs")) %>%
group_by(temp) %>%
summarise(duration = difftime(max(DATEB), min(DATEB), units = "secs"))
Run Code Online (Sandbox Code Playgroud)
任何建议表示赞赏。
我们可以cut在这里使用:
library(dplyr)
df %>%
mutate(DATEB = lubridate::mdy_hms(DATEB),
temp = cut(DATEB, breaks = "2 mins")) %>%
group_by(temp) %>%
summarise(duration = difftime(max(DATEB), min(DATEB), units = "secs"))
# A tibble: 3 x 2
# temp duration
# <fct> <drtn>
#1 2019-09-09 19:51:00 5 secs
#2 2019-09-09 19:53:00 2 secs
#3 2019-09-10 19:59:00 0 secs
Run Code Online (Sandbox Code Playgroud)