从开始和结束时间之间的间隔计算的R组变量

Jam*_*ien 5 r lubridate dplyr data.table

我有一个数据框如下

tmpdf <- data.frame(licensePlate = c("Y80901", "Y80901", "Y80901", "AMG-999", "AMG-999", "W3188", "W3188"),  
starttime= c("2015-09-18 09:55", "2015-09-18 23:00", "2015-09-20 15:00", "2015-09-17 15:42", "2015-09-21 09:22", "2015-09-17 09:00", "2015-09-21 14:00"),
endtime = c("2015-09-18 17:55", "2015-09-20 11:00", "2015-09-21 12:00",  "2015-09-18 13:00",  "2015-09-21 14:22", "2015-09-21 12:00", "2015-09-21 16:00"))
    tmpdf
      licensePlate        starttime          endtime
    1       Y80901 2015-09-18 09:55 2015-09-18 17:55
    2       Y80901 2015-09-18 23:00 2015-09-20 11:00
    3       Y80901 2015-09-20 15:00 2015-09-21 12:00
    4      AMG-999 2015-09-17 15:42 2015-09-18 13:00
    5      AMG-999 2015-09-21 09:22 2015-09-21 14:22
    6        W3188 2015-09-17 09:00 2015-09-21 12:00
    7        W3188 2015-09-21 14:00 2015-09-21 16:00
Run Code Online (Sandbox Code Playgroud)

我想计算每个licensePlate每天使用的最后n天(例如,从9月17日到9月21日的最后5天),我的预期结果如下:

   Period            LicensePlate        Used Time   

1 2015-09-17         Y80901              0
2 2015-09-17         AMG-999             8.3     
3 2015-09-17         W3188               15
4 2015-09-18         Y80901              9
5 2015-09-18         AMG-999             13
6 2015-09-18         W3188               24
7 2015-09-19         Y80901              24
8 2015-09-19         AMG-999             0
9 2015-09-19         W3188               24
10 2015-09-20        Y80901              20
11 2015-09-20        AMG-999             0
12 2015-09-20        W3188               24
13 2015-09-21        Y80901              12
14 2015-09-21        AMG-999             5
15 2015-09-21        W3188               14
Run Code Online (Sandbox Code Playgroud)

我认为dplyr/data.table和lubridate可用于获取我的结果,我可能需要以天为单位测量时间段,但我不知道如何在开始/结束时间间隔内开始/结束时切入行.

Dav*_*urg 3

这里有一些可以帮助您入门的东西。这几乎是您想要的输出,因为它没有显示licensePlate每个周期的缺失值。

第一步是将您的日期转换为有效的POSIXct类,然后将数据扩展到每分钟的级别(可能是此解决方案中成本最高的部分),并在总结结果的同时进行聚合(我licensePlate在这里没有使用因为它对00 点到 1 点之间的值的处理很糟糕)。Periodas.DatePOSIX

library(data.table)
setDT(tmpdf)[, `:=`(starttime = as.POSIXct(starttime), endtime = as.POSIXct(endtime))]
res <- tmpdf[, .(licensePlate, Period = seq(starttime, endtime, by = "1 min")), by = 1:nrow(tmpdf)]
res[, .(Used_Time = round(.N/60L, 1L)), keyby = .(Period = substr(Period, 1L, 10L), licensePlate)]
#         Period licensePlate Used_Time
#  1: 2015-09-17      AMG-999       8.3
#  2: 2015-09-17        W3188      15.0
#  3: 2015-09-18      AMG-999      13.0
#  4: 2015-09-18        W3188      24.0
#  5: 2015-09-18       Y80901       9.0
#  6: 2015-09-19        W3188      24.0
#  7: 2015-09-19       Y80901      24.0
#  8: 2015-09-20        W3188      24.0
#  9: 2015-09-20       Y80901      20.0
# 10: 2015-09-21      AMG-999       5.0
# 11: 2015-09-21        W3188      14.0
# 12: 2015-09-21       Y80901      12.0
Run Code Online (Sandbox Code Playgroud)