我是data.table包的新手,有一个简单的问题.我有两个data.tables与使用键进行比较.在data.table 1中,如果在data.table B中同样找到键列A和B,则列C的值从"NO"变为"OK".此步骤不可避免地必须完成.
library(data.table)
df_1 <- data.frame(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6))
df_2 <- data.frame(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99))
keys <- c("A","B")
dt_1 <- data.table(df_1, key = keys)
dt_2 <- data.table(df_2, key = keys)
dt_1[dt_2, C := "OK"]
Run Code Online (Sandbox Code Playgroud)
现在我得到data.table:
A B C
1: 1 x OK
2: 1 y NO
3: 3 z OK
4: 5 q OK
5: 6 w NO
6: 7 e NO
Run Code Online (Sandbox Code Playgroud)
我想包括第二个操作.如果在data.table 2中,列A的值不等于列D,则应在第一次操作之后使用列D的值.含义D列优于A.无论D中的值有多少,这都应该有效.所需的data.table看起来如下:
A B C
1: 99 x OK
2: 1 y NO
3: 3 z OK
4: 5 q OK
5: 6 w NO
6: 7 e NO
Run Code Online (Sandbox Code Playgroud)
我没有成功就累了.
dt_1[dt_2, A != D, A := D]
Run Code Online (Sandbox Code Playgroud)
谢谢您的帮助!
尝试:
dt_1[C == "OK", A:= dt_2[,D]]
# A B C
# 1: 99 x OK
# 2: 1 y NO
# 3: 3 z OK
# 4: 5 q OK
# 5: 6 w NO
# 6: 7 e NO
Run Code Online (Sandbox Code Playgroud)
以下是您首先应该如何完成整个过程。
创建两个数据集作为data.tableS IN首位(或转换到位使用setDT)
dt_1 <- data.table(A=c(1,1,3,5,6,7), B = c("x","y","z","q","w","e"), C = rep("NO",6))
dt_2 <- data.table(A=c(3,5,1), B = c("z","q","x"), D=c(3,5,99))
Run Code Online (Sandbox Code Playgroud)
然后使用setkeyv而不是使用<-运算符键入它们
keys <- c("A","B")
setkeyv(dt_1, keys)
setkeyv(dt_2, keys)
Run Code Online (Sandbox Code Playgroud)
然后只需在单个连接中更新两列
dt_1[dt_2, `:=`(C = "OK", A = i.D)]
# A B C
# 1: 99 x OK
# 2: 1 y NO
# 3: 3 z OK
# 4: 5 q OK
# 5: 6 w NO
# 6: 7 e NO
Run Code Online (Sandbox Code Playgroud)
在这种情况下,条件df_1$A != df_2$D是多余的
| 归档时间: |
|
| 查看次数: |
1392 次 |
| 最近记录: |