我希望使用包data.table和参数roll ='nearest'在'date'列上找到最接近的匹配项。我首先在另一列(字母)上进行匹配:
set.seed(1)
A <- data.table( dates.A = seq.Date(as.Date('2008-01-01'),as.Date('2008-01-31'), by = '3 days'),
letters.A = LETTERS[1:4] , value.A = runif(4) )
B <- data.table( date.B = seq.Date(as.Date('2008-01-01'),as.Date('2008-01-05'), by = 'days'),
letters.B = LETTERS[1:4] , value.B = runif(4) )
#### Define the columns I merge on
A[, ':=' (dates.merge = dates.A, letters.merge = letters.A)]
B[, ':=' (dates.merge = date.B, letters.merge = letters.B)]
setkeyv(A, c('letters.merge','dates.merge'))
setkeyv(B, c('letters.merge','dates.merge'))
result <- B[A, roll = 'nearest']
#### As a side note, how do I avoid the change in order of my data.tables??
setorder(result,dates.A,letters.A)
setorder(A,dates.A)
setorder(B,date.B)
Run Code Online (Sandbox Code Playgroud)
结果A和B的输出如下所示:
> result
date.B letters.B value.B dates.merge letters.merge dates.A letters.A value.A
1: 2008-01-01 A 0.2016819 2008-01-01 A 2008-01-01 A 0.2655087
2: 2008-01-02 B 0.8983897 2008-01-04 B 2008-01-04 B 0.3721239
3: 2008-01-03 C 0.9446753 2008-01-07 C 2008-01-07 C 0.5728534
4: 2008-01-04 D 0.6607978 2008-01-10 D 2008-01-10 D 0.9082078
5: 2008-01-05 A 0.2016819 2008-01-13 A 2008-01-13 A 0.2655087
6: 2008-01-02 B 0.8983897 2008-01-16 B 2008-01-16 B 0.3721239
7: 2008-01-03 C 0.9446753 2008-01-19 C 2008-01-19 C 0.5728534
8: 2008-01-04 D 0.6607978 2008-01-22 D 2008-01-22 D 0.9082078
9: 2008-01-05 A 0.2016819 2008-01-25 A 2008-01-25 A 0.2655087
10: 2008-01-02 B 0.8983897 2008-01-28 B 2008-01-28 B 0.3721239
11: 2008-01-03 C 0.9446753 2008-01-31 C 2008-01-31 C 0.5728534
> A
dates.A letters.A value.A dates.merge letters.merge
1: 2008-01-01 A 0.2655087 2008-01-01 A
2: 2008-01-04 B 0.3721239 2008-01-04 B
3: 2008-01-07 C 0.5728534 2008-01-07 C
4: 2008-01-10 D 0.9082078 2008-01-10 D
5: 2008-01-13 A 0.2655087 2008-01-13 A
6: 2008-01-16 B 0.3721239 2008-01-16 B
7: 2008-01-19 C 0.5728534 2008-01-19 C
8: 2008-01-22 D 0.9082078 2008-01-22 D
9: 2008-01-25 A 0.2655087 2008-01-25 A
10: 2008-01-28 B 0.3721239 2008-01-28 B
11: 2008-01-31 C 0.5728534 2008-01-31 C
> B
date.B letters.B value.B dates.merge letters.merge
1: 2008-01-01 A 0.2016819 2008-01-01 A
2: 2008-01-02 B 0.8983897 2008-01-02 B
3: 2008-01-03 C 0.9446753 2008-01-03 C
4: 2008-01-04 D 0.6607978 2008-01-04 D
5: 2008-01-05 A 0.2016819 2008-01-05 A
Run Code Online (Sandbox Code Playgroud)
但是,请注意,日期与日期之间最接近的日期。A“ 2008-01-07”应为“ 2008-01-05”(见B),而不是“ 2008-01-03”。对于date.A结果中“ 2008-01-07”以下的所有日期,也是如此。
我在这里做错了什么?