Hea*_*oes 4 time logic r dataframe
我有两组数据:
第一套:
patient<-c("A","A","B","B","C","C","C","C")
arrival<-c("11:00","11:00","13:00","13:00","14:00","14:00","14:00","14:00")
lastRow<-c("","Yes","","Yes","","","","Yes")
data1<-data.frame(patient,arrival,lastRow)
Run Code Online (Sandbox Code Playgroud)
另一组数据:
patient<-c("A","A","A","A","B","B","B","C","C","C")
availableSlot<-c("11:15","11:35","11:45","11:55","12:55","13:55","14:00","14:00","14:10","17:00")
data2<-data.frame(patient, availableSlot)
Run Code Online (Sandbox Code Playgroud)
我想创建为第一个数据集添加一列,以便对于每个患者的每个最后一行,它显示最接近到达时间的可用插槽:
结果将是:
patient arrival lastRow availableSlot
A 11:00
A 11:00 Yes 11:15
B 13:00
B 13:00 Yes 12:55
C 14:00
C 14:00
C 14:00
C 14:00 Yes 14:00
Run Code Online (Sandbox Code Playgroud)
如果有人能告诉我如何在R中实现这一点,我将不胜感激.
我使用data.table,首先通过转换为ITime进行清理并忽略冗余行:
library(data.table)
setDT(data1)[, arrival := as.ITime(as.character(arrival))]
setDT(data2)[, availableSlot := as.ITime(as.character(availableSlot))]
DT1 = unique(data1, by="patient", fromLast=TRUE)
Run Code Online (Sandbox Code Playgroud)
然后你可以做一个"滚动连接":
res = data2[DT1, on=.(patient, availableSlot = arrival), roll="nearest",
.(patient, availableSlot = x.availableSlot)]
# patient availableSlot
# 1: A 11:15:00
# 2: B 12:55:00
# 3: C 14:00:00
Run Code Online (Sandbox Code Playgroud)
这个怎么运作
语法是x[i, on=, roll=, j]
.
on=
是合并列. i
,我们正在寻找匹配x
.roll="nearest"
,最后一列on=
被"滚动"到最接近的匹配.on=
原始表中的列可以用x.*
和i.*
前缀引用.j
参数应该给列的列表,并且.()
是一个别名list()
这里.查看软件包的介绍材料,网址?data.table
为http://r-datatable.com/Getting-started,并输入与滚动连接相关的文档.
我会停下来res
,但如果你真的想要它回到原来的表...
# a very nonstandard step:
data1[lastRow == "Yes", availableSlot := res$availableSlot ]
# patient arrival lastRow availableSlot
# 1: A 11:00:00 <NA>
# 2: A 11:00:00 Yes 11:15:00
# 3: B 13:00:00 <NA>
# 4: B 13:00:00 Yes 12:55:00
# 5: C 14:00:00 <NA>
# 6: C 14:00:00 <NA>
# 7: C 14:00:00 <NA>
# 8: C 14:00:00 Yes 14:00:00
Run Code Online (Sandbox Code Playgroud)
现在,data1
有availableSlot
一个新列,类似于你做的时候data1$col <- val
.