ltk*_*isp 5 time merge r intervals dataframe
我试图合并两个dataframes共同点的时间.但是,两者之间的时间记录可能不同.我希望按时间合并这两个,但缓冲间隔为30分钟.
dataframes概念性设置为这样:Data_cam <- data.frame(Start_haul=c(("31-10-2015 07:13:00"),("31-10-2015 22:40:00"),("01-11-2015 06:48:00"),("01-11-2015 16:13:00")),
VesselID=c('XBBX','XBBX','XAAX','XAAX'),
Species=("TOR"), Discard=c(0.28,0.96,2.92,0))
Data_sif <- data.frame(Start_haul=c(("31-10-2015 07:05:00"),("31-10-2015 07:05:00"),("31-10-2015 07:05:00"),("31-10-2015 23:05:00"),("31-10-2015 23:05:00"),("01-11-2015 06:28:00"),("01-11-2015 06:28:00"),("01-11-2015 06:28:00"),("01-11-2015 16:11:00")), VesselID=c('XBBX','XBBX','XBBX','XBBX','XBBX','XAAX','XAAX','XAAX','XAAX'),Species=("TOR"), Size_class=c("1","2","3","4","5","1","2","4","5"), Landing_kg=c(10.5,20.5,5.6,400,2,120,250,10.3,2.1))
Run Code Online (Sandbox Code Playgroud)
这意味着Data_sif中的三个第一行与Data_cam中的第一行匹配,我想将Data_cam中第一行的"Discard" - 值列添加到Data_sif中的第三行.同样,Data_sif中的第4行和第5行与Data_cam中的第二行匹配,我想在此处添加"Discard",依此类推所有行."Discard"列中的值应重复显示在"Size_class"列中显示的公共时间戳的每个值.
所需的输出看起来像这样
Data_combined <- data.frame(Start_haul=c(("31-10-2015 07:05:00"),("31-10-2015 07:05:00"),("31-10-2015 07:05:00"),("31-10-2015 23:05:00"),("31-10-2015 23:05:00"),("01-11-2015 06:28:00"),("01-11-2015 06:28:00"),("01-11-2015 06:28:00"),("01-11-2015 16:11:00")), VesselID=c('XBBX','XBBX','XBBX','XBBX','XBBX','XAAX','XAAX','XAAX','XAAX'),Species=("TOR"), Size_class=c("1","2","3","4","5","1","2","4","5"), Landing_kg=c(10.5,20.5,5.6,400,2,120,250,10.3,2.1),
Discard=c(0.28,0.28,0.28,0.96,0.96,2.92,2.92,2.92,0))
Run Code Online (Sandbox Code Playgroud)
我想在最终实现中添加更多列,包括位置数据,但为了简单起见,我想从合并Discard-column开始.
我已经尝试过旧帖子但是没能为我的数据实现它.
这是一个带有lubridate和 的解决方案dplyr。这有点繁琐,但它有效:
library(lubridate)
library(dplyr)
Data_cam <- data.frame(Start_haul=c(("31-10-2015 07:13:00"),("31-10-2015 22:40:00"),("01-11-2015 06:48:00"),("01-11-2015 16:13:00")),
VesselID=c('XBBX','XBBX','XAAX','XAAX'),
Species=("TOR"), Discard=c(0.28,0.96,2.92,0))
Data_sif <- data.frame(Start_haul=c(("31-10-2015 07:05:00"),("31-10-2015 07:05:00"),("31-10-2015 07:05:00"),("31-10-2015 23:05:00"),("31-10-2015 23:05:00"),("01-11-2015 06:28:00"),("01-11-2015 06:28:00"),("01-11-2015 06:28:00"),("01-11-2015 16:11:00")),
VesselID=c('XBBX','XBBX','XBBX','XBBX','XBBX','XAAX','XAAX','XAAX','XAAX'),Species=("TOR"), Size_class=c("1","2","3","4","5","1","2","4","5"),
Landing_kg=c(10.5,20.5,5.6,400,2,120,250,10.3,2.1))
Data_sif %>%left_join(., Data_cam, by = "VesselID",suffix=c('_sif','_cam')) %>% mutate(buff1 = dmy_hms(Start_haul_cam) - minutes(30)) %>%
mutate(buff2 = dmy_hms(Start_haul_cam) + minutes(30)) %>%
filter(dmy_hms(Start_haul_sif) >= buff1 & dmy_hms(Start_haul_sif) <= buff2) %>%
select(-contains('_cam')) %>% select(-contains('buff'))
# Start_haul_sif VesselID Species_sif Size_class Landing_kg Discard
# 1 31-10-2015 07:05:00 XBBX TOR 1 10.5 0.28
# 2 31-10-2015 07:05:00 XBBX TOR 2 20.5 0.28
# 3 31-10-2015 07:05:00 XBBX TOR 3 5.6 0.28
# 4 31-10-2015 23:05:00 XBBX TOR 4 400.0 0.96
# 5 31-10-2015 23:05:00 XBBX TOR 5 2.0 0.96
# 6 01-11-2015 06:28:00 XAAX TOR 1 120.0 2.92
# 7 01-11-2015 06:28:00 XAAX TOR 2 250.0 2.92
# 8 01-11-2015 06:28:00 XAAX TOR 4 10.3 2.92
# 9 01-11-2015 16:11:00 XAAX TOR 5 2.1 0.00
Run Code Online (Sandbox Code Playgroud)
编辑:
或者稍微瘦一点:
Data_sif %>%
left_join(., Data_cam, by = "VesselID",suffix=c('_sif','_cam')) %>%
filter(dmy_hms(Start_haul_sif) >= dmy_hms(Start_haul_cam) - minutes(30) &
dmy_hms(Start_haul_sif) <= dmy_hms(Start_haul_cam) + minutes(30)) %>%
select(-contains('_cam'))
Run Code Online (Sandbox Code Playgroud)