Dav*_*e M 6 r time-series grouped-table
我喜欢R,但有些问题很难解决.
挑战在于在具有大于或等于6小时的基于时间的窗口的不规则时间序列中找到小于30的滚动总和的第一个实例.我有一个系列的样本
Row Person DateTime Value
1 A 2014-01-01 08:15:00 5
2 A 2014-01-01 09:15:00 5
3 A 2014-01-01 10:00:00 5
4 A 2014-01-01 11:15:00 5
5 A 2014-01-01 14:15:00 5
6 B 2014-01-01 08:15:00 25
7 B 2014-01-01 10:15:00 25
8 B 2014-01-01 19:15:00 2
9 C 2014-01-01 08:00:00 20
10 C 2014-01-01 09:00:00 5
11 C 2014-01-01 13:45:00 1
12 D 2014-01-01 07:00:00 1
13 D 2014-01-01 08:15:00 13
14 D 2014-01-01 14:15:00 15
For Person A, Rows 1 & 5 create a minimum 6 hour interval with a running sum of 25 (which is less than 30).
For Person B, Rows 7 & 8 create a 9 hour interval with a running sum of 27 (again less than 30).
For Person C, using Rows 9 & 10, there is no minimum 6 hour interval (it is only 5.75 hours) although the running sum is 26 and is less than 30.
For Person D, using Rows 12 & 14, the interval is 7.25 hours but the running sum is 30 and is not less than 30.
Run Code Online (Sandbox Code Playgroud)
给定n个观察值,必须比较n*(n-1)/ 2个区间.例如,n = 2时,只需要1个间隔进行评估.对于n = 3,有3个间隔.等等.
我假设这是子集和问题的变体(http://en.wikipedia.org/wiki/Subset_sum_problem)
虽然可以对数据进行排序,但我怀疑这需要一个强力解决方案来测试每个间隔.
任何帮助,将不胜感激.
编辑:这是DateTime列格式为POSIXct的数据:
df <- structure(list(Person = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"),
DateTime = structure(c(1388560500, 1388564100, 1388566800,
1388571300, 1388582100, 1388560500, 1388567700, 1388600100,
1388559600, 1388563200, 1388580300, 1388556000, 1388560500,
1388582100), class = c("POSIXct", "POSIXt"), tzone = ""),
Value = c(5L, 5L, 5L, 5L, 5L, 25L, 25L, 2L, 20L, 5L, 1L,
1L, 13L, 15L)), .Names = c("Person", "DateTime", "Value"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14"), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
我发现这在 R 中也是一个难题。所以我给它做了一个包!
library("devtools")
install_github("boRingTrees","mgahan")
require(boRingTrees)
Run Code Online (Sandbox Code Playgroud)
当然,您必须正确计算出上限的单位。
如果您有兴趣,这里还有一些更多文档。 https://github.com/mgahan/boRingTrees
对于@beginneR 提供的数据df,您可以使用以下代码来获取 6 小时滚动总和。
require(data.table)
setDT(df)
df[ , roll := rollingByCalcs(df,dates="DateTime",target="Value",
by="Person",stat=sum,lower=0,upper=6*60*60)]
Person DateTime Value roll
1: A 2014-01-01 01:15:00 5 5
2: A 2014-01-01 02:15:00 5 10
3: A 2014-01-01 03:00:00 5 15
4: A 2014-01-01 04:15:00 5 20
5: A 2014-01-01 07:15:00 5 25
6: B 2014-01-01 01:15:00 25 25
7: B 2014-01-01 03:15:00 25 50
8: B 2014-01-01 12:15:00 2 2
9: C 2014-01-01 01:00:00 20 20
10: C 2014-01-01 02:00:00 5 25
11: C 2014-01-01 06:45:00 1 26
12: D 2014-01-01 00:00:00 1 1
13: D 2014-01-01 01:15:00 13 14
14: D 2014-01-01 07:15:00 15 28
Run Code Online (Sandbox Code Playgroud)
原来的帖子对我来说很不清楚,所以这可能不正是他想要的。如果提供具有所需输出的专栏,我想我可以提供更多帮助。