我有一个数据集,它是特定设施发生事件的时间列表:
> head(facility_events);
facility_id event_time
1 20248 2018-01-01 00:00:01
2 12445 2018-01-01 00:00:04
3 20248 2018-01-01 00:00:08
4 17567 2018-01-01 00:00:47
5 17567 2018-01-01 00:03:50
6 10459 2018-01-01 00:04:01
Run Code Online (Sandbox Code Playgroud)
我想通过按设施对数据进行分组并将事件分组为3分钟的间隔来生成具有总和的数据帧.输出看起来像这样:
count facility interval
2 20248 0
1 12445 0
1 17567 0
1 17567 1
1 10459 1
Run Code Online (Sandbox Code Playgroud)
你如何在R中实现这一目标?
您可以使用tidyverse与lubridate此:
df <- data.frame(facility_id = c(20248, 12445, 20248, 17567, 17567, 10459),
event_time = as.POSIXct(c("2018-01-01 00:00:01", "2018-01-01 00:00:04", "2018-01-01 00:00:08", "2018-01-01 00:00:47", "2018-01-01 00:03:50", "2018-01-01 00:04:01")))
library(tidyverse)
df %>%
mutate(interval = lubridate::minute(event_time) %/% 3) %>%
group_by(facility_id, interval) %>%
summarise(count = n())
# A tibble: 5 x 3
# Groups: facility_id [?]
facility_id interval count
<dbl> <int> <int>
1 10459 1 1
2 12445 0 1
3 17567 0 1
4 17567 1 1
5 20248 0 2
Run Code Online (Sandbox Code Playgroud)
这是一个解决方案data.table。同样的逻辑:
这是一个具有data.table简洁语法的单行。
df <- data.frame(facility_id = c(20248, 12445, 20248, 17567, 17567, 10459),
event_time = as.POSIXct(c("2018-01-01 00:00:01", "2018-01-01 00:00:04", "2018-01-01 00:00:08", "2018-01-01 00:00:47", "2018-01-01 00:03:50", "2018-01-01 00:04:01")))
library(data.table)
setDT(df)
df[, .(count = .N), by = .(facility_id, interval= minute(event_time) %/% 3)]
#> facility_id interval count
#> 1: 20248 0 2
#> 2: 12445 0 1
#> 3: 17567 0 1
#> 4: 17567 1 1
#> 5: 10459 1 1
Run Code Online (Sandbox Code Playgroud)
由reprex 包(v0.1.1.9000)于2018年 1 月 14 日创建。