我使用data.table以下方法进行左非等连接:
OUTPUT <- DT2[DT1, on=.(DOB, FORENAME, SURNAME, POSTCODE, START_DATE <= MONTH, EXPIRY_DATE >= MONTH)]
Run Code Online (Sandbox Code Playgroud)
该OUTPUT包含正确的左连接,与该异常MONTH列(这是目前在DT1)的缺失.
这是一个错误data.table吗?
注:当然,START_DATE,EXPIRY_DATE和MONTH在同一个YYYY-MM-DD,IDATE格式.基于这些非等标准,连接的结果是正确的.只是缺少该列,我需要在进一步的工作中使用它.
编辑1:简化的可重复示例
DT1 <- structure(list(ID = c(1, 2, 3), FORENAME = c("JOHN", "JACK",
"ROB"), SURNAME = c("JOHNSON", "JACKSON", "ROBINSON"), MONTH = structure(c(16953L,
16953L, 16953L), class = c("IDate", "Date"))), .Names = c("ID",
"FORENAME", "SURNAME", "MONTH"), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
DT2 <- structure(list(CERT_NUMBER = 999, FORENAME = …Run Code Online (Sandbox Code Playgroud) 我试图删除列中data.frame的值posn不在另一个中给出的范围内的行data.frame,具有data.table非equi连接功能.
以下是我的数据的样子:
library(data.table)
df.cov <-
structure(list(posn = c(1, 2, 3, 165, 1000), att = c("a", "b",
"c", "d", "e")), .Names = c("posn", "att"), row.names = c(NA,
-5L), class = "data.frame")
df.exons <-
structure(list(start = c(2889, 2161, 277, 164, 1), end = c(3329,
2826, 662, 662, 168)), .Names = c("start", "end"), row.names = c(NA,
-5L), class = "data.frame")
setDT(df.cov)
setDT(df.exons)
df.cov
# posn att
# 1: 1 a
# 2: 2 b
# …Run Code Online (Sandbox Code Playgroud) 假设我有两个数据表:
X <- data.table(id = 1:5, L = letters[1:5])
id L
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
Y <- data.table(id = 3:5, L = c(NA, "g", "h"), N = c(10, NA, 12))
id L N
1: 3 NA 10
2: 4 g NA
3: 5 h 12
Run Code Online (Sandbox Code Playgroud)
有没有可能做一个左外连接X,并Y通过id使用内置的功能的数据表?如果没有,我想构建一个leftOuterJoin具有以下预期输出的函数(例如):
leftOuterJoin(X, Y, on = "id")
id L N
1: 1 a NA
2: 2 b …Run Code Online (Sandbox Code Playgroud)