R日期时间对齐和填充值

Adi*_*hag 7 r vectorization dataframe

我有多个帧,为此目的假设2.每个帧包含2列 - 索引列和值列

sz<-5;
frame_1<-data.frame(index=sort(sample(1:10,sz,replace=F)),value=rpois(sz,50));
frame_2<-data.frame(index=sort(sample(1:10,sz,replace=F)),value=rpois(sz,50));
Run Code Online (Sandbox Code Playgroud)

FRAME_1:

 index value
  1    49
  6    62
  7    58
  8    30
 10    50
Run Code Online (Sandbox Code Playgroud)

frame_2:

index value
  4    60
  5    64
  6    48
  7    46
  9    57
Run Code Online (Sandbox Code Playgroud)

目标是创建第三帧frame_3,其索引将是frame_1和frame_2中的索引的并集,

frame_3<-data.frame(index = sort(union(frame_1$index,frame_2$index)));
Run Code Online (Sandbox Code Playgroud)

它将包含两个额外的列,value_1和value_2.

frame_3 $ value_1将从frame_1 $ value中填写,frame_3 $ value_2将从frame_2 $ value中填写;

这些应该填写如下:frame_3:

index value_1 value_2
1      49       NA
4      49       60     # value_1 is filled through with previous value
5      49       64     # value_1 is filled through with previous value
6      62       48     
7      58       46   
8      30       46     # value_2 is filled through with previous value
9      30       57     # value_1 is filled through with previous value
10     50       57     # value_1 is filled through with previous value
Run Code Online (Sandbox Code Playgroud)

我正在寻找一个有效的解决方案,因为我正在处理成千上万的记录

Aru*_*run 8

这个问题尖叫着data.table.您可以使用循环逐个递归地构造列x[y, roll=TRUE].

require(data.table)
dt1 <- data.table(frame_1)
dt2 <- data.table(frame_2)
setkey(dt1, index)
setkey(dt2, index)
dt3 <- data.table(index = sort(unique(c(dt1$index, dt2$index))))
> dt1[dt2[dt3, roll=TRUE], roll=TRUE]

#    index value value.1
# 1:     1    49      NA
# 2:     4    49      60
# 3:     5    49      64
# 4:     6    62      48
# 5:     7    58      46
# 6:     8    30      46
# 7:     9    30      57
# 8:    10    50      57
Run Code Online (Sandbox Code Playgroud)

  • +1你确定这是为`data.table`尖叫的问题吗?:) (2认同)