我写了一个(相当幼稚的)函数来随机选择两个指定日期之间的日期/时间
# set start and end dates to sample between
day.start <- "2012/01/01"
day.end <- "2012/12/31"
# define a random date/time selection function
rand.day.time <- function(day.start,day.end,size) {
dayseq <- seq.Date(as.Date(day.start),as.Date(day.end),by="day")
dayselect <- sample(dayseq,size,replace=TRUE)
hourselect <- sample(1:24,size,replace=TRUE)
minselect <- sample(0:59,size,replace=TRUE)
as.POSIXlt(paste(dayselect, hourselect,":",minselect,sep="") )
}
Run Code Online (Sandbox Code Playgroud)
结果如下:
> rand.day.time(day.start,day.end,size=3)
[1] "2012-02-07 21:42:00" "2012-09-02 07:27:00" "2012-06-15 01:13:00"
Run Code Online (Sandbox Code Playgroud)
但随着样本量的增加,这似乎在大幅放缓.
# some benchmarking
> system.time(rand.day.time(day.start,day.end,size=100000))
user system elapsed
4.68 0.03 4.70
> system.time(rand.day.time(day.start,day.end,size=200000))
user system elapsed
9.42 0.06 9.49
Run Code Online (Sandbox Code Playgroud)
有人能够以更有效的方式建议如何做这样的事情吗?
假设我有两个数据框,例如:
set.seed(123)
df1<-data.frame(bmi=rnorm(20, 25, 5),
date1=sample(seq.Date(as.Date("2014-01-01"),
as.Date("2014-02-28"),by="day"), 20))
df2<-data.frame(epi=1:5,
date2=as.Date(c("2014-1-8", "2014-1-15", "2014-1-28",
"2014-2-05", "2014-2-24")))
Run Code Online (Sandbox Code Playgroud)
我的问题是如何匹配bmi与epi其中DATE1最接近之前或在date2?像这样的结果:
epi date2 bmi date1
1 1 2014-01-08 33.58 2014-01-08
2 2 2014-01-15 22.64 2014-01-15
3 3 2014-01-28 22.22 2014-01-26
4 4 2014-02-05 15.17 2014-02-01
5 5 2014-02-24 27.49 2014-02-15
Run Code Online (Sandbox Code Playgroud)