请考虑以下输入数据:
41和54).T1和T2)的压力.示例数据:
data <- data.table(
time = as.POSIXct(paste("2017-01-01", c("11:59", "12:05", "12:02", "12:03", "14:00", "14:01", "14:02", "14:06")), tz = "GMT"),
instrumentId = c(41, 54, 41, 54, 41, 54, 41, 54),
tank = c("T1", "T1", "T2", "T2", "T1", "T1", "T2", "T2"),
pressure = c(25, 24, 35, 37.5, 22, 22.2, 38, 39.4))
Run Code Online (Sandbox Code Playgroud)
我想计算每个罐的仪器41和仪器54测量的压力之间的差异,假设在20分钟内测量的值属于同一样品.
理想情况下,差异的时间戳将是两个比较值的时间戳的平均值.
这是一个目前使用的脚本:
## Calculate difference of time between 2 consecutive lines
data <- data[, timeDiff := difftime(time, shift(time, type = "lag", fill = -Inf), tz = "GMT", units = "mins"),
by = tank]
# Assign the same timestamp to all the measures of a same sample
referenceTimes <- data[timeDiff > 20, .(time)]
data <- data[timeDiff < 20, time := referenceTimes]
# Calculate the difference between the values measured by both instruments
wideDt <- dcast.data.table(data,time + tank ~ instrumentId, value.var = c( "pressure"))
instruments <- as.character(unique(data$instrumentId))
wideDt <- wideDt[, difference := get(instruments[1]) - get(instruments[2])]
Run Code Online (Sandbox Code Playgroud)
它完成了这项工作,但其最大的问题是数据应该以正确的方式排序,否则时移计算会返回无意义.使用示例输入数据可以,但尝试data <- data[order(pressure)]例如"取消"它们.在这种情况下,data <- data[order(tank, time, instrumentId)]应该添加.
而且,我的印象是它可以更简洁,更有效,更清洁.总之,它可以更好地利用它data.table的力量.
预期结果是:
time tank 41 54 difference
-------------------------------------------------
2017-01-01 11:59:00 T1 25 24.0 1.0
2017-01-01 12:02:00 T2 35 37.5 -2.5
2017-01-01 14:00:00 T1 22 22.2 -0.2
2017-01-01 14:02:00 T2 38 39.4 -1.4
Run Code Online (Sandbox Code Playgroud)
知道如何正确执行此任务吗?
您可以轻松地在两个子集上执行滚动自连接,tank并且time在指定最大滚动间隔(20分钟= 20*60秒)时不需要任何初始重新排序
res <-
data[instrumentId == 54, .SD[data[instrumentId == 41], on = .(tank, time), roll = -20*60]]
res
# time instrumentId tank pressure i.instrumentId i.pressure
# 1: 2017-01-01 11:59:00 54 T1 24.0 41 25
# 2: 2017-01-01 12:02:00 54 T2 37.5 41 35
# 3: 2017-01-01 14:00:00 54 T1 22.2 41 22
# 4: 2017-01-01 14:02:00 54 T2 39.4 41 38
Run Code Online (Sandbox Code Playgroud)
然后,计算差异只是一个问题 res[, difference := pressure - i.pressure]
但如果你想要你想要的确切格式,恐怕需要一些融化/消除
res2 <-
dcast(
melt(res, c("time", "tank"),
measure = patterns("instrumentId", "pressure")),
time + tank ~ value1, value.var = "value2"
)[, difference := `41` - `54`]
res2
# time tank 41 54 difference
# 1: 2017-01-01 11:59:00 T1 25 24.0 1.0
# 2: 2017-01-01 12:02:00 T2 35 37.5 -2.5
# 3: 2017-01-01 14:00:00 T1 22 22.2 -0.2
# 4: 2017-01-01 14:02:00 T2 38 39.4 -1.4
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
91 次 |
| 最近记录: |