我有两个data.table X和ÿ.
列X:area, id, value
在列ÿ:ID, price, sales
创建两个data.tables:
X = data.table(area=c('US', 'UK', 'EU'),
id=c('c001', 'c002', 'c003'),
value=c(100, 200, 300)
)
Y = data.table(ID=c('c001', 'c002', 'c003'),
price=c(500, 200, 400),
sales=c(20, 30, 15)
)
Run Code Online (Sandbox Code Playgroud)
我为X和Y设置了键:
setkey(X, id)
setkey(Y, ID)
Run Code Online (Sandbox Code Playgroud)
现在,我尝试加入X和Ÿ通过id在X和ID在ÿ:
merge(X, Y)
merge(X, Y, by=c('id', 'ID'))
merge(X, Y, by.x='id', by.y='ID')
Run Code Online (Sandbox Code Playgroud)
所有引发的错误都说明by参数中的列名无效.
我参考了data.table手册,发现merge函数不支持 …
在这篇维基百科文章SQL join之后,我希望能够清楚地了解如何与data.table建立联接.在这个过程中,我们可能在加入NAs时发现了一个错误.以wiki为例:
R) X = data.table(name=c("Raf","Jon","Ste","Rob","Smi","Joh"),depID=c(31,33,33,34,34,NA),key="depID")
R) Y = data.table(depID=c(31,33,34,35),depName=c("Sal","Eng","Cle","Mar"),key="depID")
R) X
name depID
1: Joh NA
2: Raf 31
3: Jon 33
4: Ste 33
5: Rob 34
6: Smi 34
R) Y
depID depName
1: 31 Sal
2: 33 Eng
3: 34 Cle
4: 35 Mar
Run Code Online (Sandbox Code Playgroud)
LEFT OUTER JOIN
R) merge.data.frame(X,Y,all.x=TRUE)
depID name depName
1 31 Raf Sal
2 33 Jon Eng
3 33 Ste Eng
4 34 Rob Cle
5 34 Smi Cle
6 NA …Run Code Online (Sandbox Code Playgroud)