我有从数据库中每隔一段时间收集的数据。指标是计数器,不断增加。要获得给定时间的度量值,您必须从同一行的先前版本中减去一行。
例子:
TS INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
2014-01-29 17:20:36 1 log file sync 1 756873
2014-01-29 17:20:36 1 log file sync 2 15627
2014-01-29 17:20:36 1 log file sync 4 2925
2014-01-29 17:21:03 1 log file sync 1 761063
2014-01-29 17:21:03 1 log file sync 2 15659
2014-01-29 17:21:03 1 log file sync 4 2929
Run Code Online (Sandbox Code Playgroud)
期望输出:
TS INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
2014-01-29 17:21:03 1 log file sync 1 4190
2014-01-29 17:21:03 1 log file sync 2 32
2014-01-29 17:21:03 1 log file sync 4 4
Run Code Online (Sandbox Code Playgroud)
TS 是收集指标的时间。INST_ID、EVENT 和 WAIT_TIME_MILLI 是静态标识符。我想计算从一个 TS 到下一个 WAIT_COUNT 的增量。
我已经稍微简化了数据,但如果重要的话,有很多事件并且可以是多个 INST_ID。
这是测试数据框:
structure(list(TS = structure(c(1391034063.541, 1391034063.541,
1391034063.541, 1391034036.136, 1391034036.136, 1391034036.136
), class = c("POSIXct", "POSIXt")), INST_ID = c(1, 1, 1, 1, 1,
1), EVENT = c("log file sync", "log file sync", "log file sync",
"log file sync", "log file sync", "log file sync"), WAIT_TIME_MILLI = c(1,
2, 4, 1, 2, 4), WAIT_COUNT = c(761063, 15659, 2929, 756873, 15627,
2925)), .Names = c("TS", "INST_ID", "EVENT", "WAIT_TIME_MILLI",
"WAIT_COUNT"), class = "data.frame", row.names = c(NA, 6L))
Run Code Online (Sandbox Code Playgroud)
如果您的数据是名为 dat 的 data.frame
library(dplyr)
dat <- arrange(dat, WAIT_TIME_MILLI, TS)
dat <- group_by(dat, WAIT_TIME_MILLI)
dat <- mutate(dat, diff = WAIT_COUNT - lag(WAIT_COUNT))
filter(dat, !is.na(diff))
Run Code Online (Sandbox Code Playgroud)
或者:
library(dplyr)
dat %.%
arrange(WAIT_TIME_MILLI, TS) %.%
group_by(WAIT_TIME_MILLI) %.%
mutate(diff = WAIT_COUNT - lag(WAIT_COUNT)) %.%
filter(!is.na(diff))
Run Code Online (Sandbox Code Playgroud)