我的数据看起来像这样:
library(plyr)
dates<-data.frame(datecol=as.POSIXct(c(
"2010-04-03 03:02:38 UTC",
"2010-04-03 03:03:14 UTC",
"2010-04-20 03:05:52 UTC",
"2010-04-20 03:07:42 UTC",
"2010-04-21 03:09:38 UTC",
"2010-04-21 03:10:14 UTC",
"2010-04-21 03:12:52 UTC",
"2010-04-23 03:13:42 UTC",
"2010-04-23 03:15:42 UTC",
"2010-04-23 03:16:38 UTC",
"2010-04-23 03:18:14 UTC",
"2010-04-24 03:21:52 UTC",
"2010-04-24 03:22:42 UTC",
"2010-04-24 03:24:19 UTC",
"2010-04-24 03:25:19 UTC"
)), x = cumsum(runif(15)*10),y=cumsum(runif(15)*20))
Run Code Online (Sandbox Code Playgroud)
我想将我的数据分组为5天,因此所有5天或更短时间的点都放在一个组中.我尝试了这里建议的内容:
gr<-ddply(dates,.(cut(datecol,"5 day",include.lowest = TRUE)),"[")
Run Code Online (Sandbox Code Playgroud)
但由于某些原因,我最终得到3组而不是2组,而04/21和04/23的分数分成不同的组,即使它们相隔不到5天.
这是我想得到的:
group datecol x y
1 1 2010-04-03 03:02:38 8.112423 4.790036
2 1 2010-04-03 03:03:14 11.184709 22.903475
3 2 2010-04-20 03:05:52 17.306835 32.286891
4 2 2010-04-20 03:07:42 24.071488 38.941709
5 2 2010-04-21 03:09:38 26.451493 48.378477
6 2 2010-04-21 03:10:14 33.090645 53.148149
7 2 2010-04-21 03:12:52 38.536416 64.346574
8 2 2010-04-23 03:13:42 40.911074 79.419002
9 2 2010-04-23 03:15:42 41.977579 89.760210
10 2 2010-04-23 03:16:38 46.838773 95.266709
11 2 2010-04-23 03:18:14 48.367159 112.619969
12 2 2010-04-24 03:01:52 57.470412 113.594423
13 2 2010-04-24 03:02:42 63.202005 123.653370
14 2 2010-04-24 03:04:19 65.615348 137.184153
15 2 2010-04-24 03:25:19 75.177633 137.559003
Run Code Online (Sandbox Code Playgroud)
怎么样cumsum,如果必要的检查滞后值和更新?我们使用库中的shift()函数data.table来实现滞后.
library(data.table)
dates$group <- cumsum(ifelse(difftime(dates$datecol,
shift(dates$datecol, fill = dates$datecol[1]),
units = "days") >= 5
,1, 0)) + 1
head(dates)
# datecol x y group
#1 2010-04-03 03:02:38 4.776196 5.160336 1
#2 2010-04-03 03:03:14 13.388291 14.731241 1
#3 2010-04-20 03:05:52 17.769262 30.057454 2
#4 2010-04-20 03:07:42 20.217235 31.742392 2
#5 2010-04-21 03:09:38 20.924025 49.248819 2
#6 2010-04-21 03:10:14 21.918687 56.030278 2
Run Code Online (Sandbox Code Playgroud)
这假设您的数据按时间从最小到最大排序
| 归档时间: |
|
| 查看次数: |
3811 次 |
| 最近记录: |