找到每个患者最接近的匹配时间

Hea*_*oes 4 time logic r dataframe

我有两组数据:

第一套:

 patient<-c("A","A","B","B","C","C","C","C")
 arrival<-c("11:00","11:00","13:00","13:00","14:00","14:00","14:00","14:00")
 lastRow<-c("","Yes","","Yes","","","","Yes")

 data1<-data.frame(patient,arrival,lastRow)
Run Code Online (Sandbox Code Playgroud)

另一组数据:

 patient<-c("A","A","A","A","B","B","B","C","C","C")
 availableSlot<-c("11:15","11:35","11:45","11:55","12:55","13:55","14:00","14:00","14:10","17:00")

 data2<-data.frame(patient, availableSlot)
Run Code Online (Sandbox Code Playgroud)

我想创建为第一个数据集添加一列,以便对于每个患者的每个最后一行,它显示最接近到达时间的可用插槽:

结果将是:

  patient arrival lastRow availableSlot
       A   11:00        
       A   11:00     Yes     11:15
       B   13:00        
       B   13:00     Yes     12:55
       C   14:00        
       C   14:00        
       C   14:00        
       C   14:00     Yes     14:00
Run Code Online (Sandbox Code Playgroud)

如果有人能告诉我如何在R中实现这一点,我将不胜感激.

Fra*_*ank 8

我使用data.table,首先通过转换为ITime进行清理并忽略冗余行:

library(data.table)
setDT(data1)[, arrival := as.ITime(as.character(arrival))]
setDT(data2)[, availableSlot := as.ITime(as.character(availableSlot))]
DT1 = unique(data1, by="patient", fromLast=TRUE)
Run Code Online (Sandbox Code Playgroud)

然后你可以做一个"滚动连接":

res = data2[DT1, on=.(patient, availableSlot = arrival), roll="nearest", 
  .(patient, availableSlot = x.availableSlot)]

#    patient availableSlot
# 1:       A      11:15:00
# 2:       B      12:55:00
# 3:       C      14:00:00
Run Code Online (Sandbox Code Playgroud)

这个怎么运作

语法是x[i, on=, roll=, j].

  • on= 是合并列.
  • 这是一个联合:对于每一行i,我们正在寻找匹配x.
  • 随着roll="nearest",最后一列on=被"滚动"到最接近的匹配.
  • on=原始表中的列可以用x.*i.*前缀引用.
  • j参数应该给列的列表,并且.()是一个别名list()这里.

查看软件包的介绍材料,网址?data.tablehttp://r-datatable.com/Getting-started,并输入与滚动连接相关的文档.


我会停下来res,但如果你真的想要它回到原来的表...

# a very nonstandard step:
data1[lastRow == "Yes", availableSlot := res$availableSlot ]

#    patient  arrival lastRow availableSlot
# 1:       A 11:00:00                  <NA>
# 2:       A 11:00:00     Yes      11:15:00
# 3:       B 13:00:00                  <NA>
# 4:       B 13:00:00     Yes      12:55:00
# 5:       C 14:00:00                  <NA>
# 6:       C 14:00:00                  <NA>
# 7:       C 14:00:00                  <NA>
# 8:       C 14:00:00     Yes      14:00:00
Run Code Online (Sandbox Code Playgroud)

现在,data1availableSlot一个新列,类似于你做的时候data1$col <- val.

  • 辉煌的解释+1! (3认同)
  • @Headandtoes是的,为此,你需要`as.ITime(as.POSIXct("02Aug2016 11:15:00",format ="%d%b%Y%H:%M:%S"))或者只是坚持使用POSIXct而不是提取时间.有关相关文档,请参阅`?strptime`.如果你不想过多地使用日期时间格式,你也可以尝试`anytime`包,它似乎可靠地猜测相关的`format`. (2认同)