假设我从数据框开始:
ID Measurement1 Measurement2
1 45 104
2 34 87
3 23 99
4 56 67
...
Run Code Online (Sandbox Code Playgroud)
然后我有第二个数据框,用于更新第一个中的记录:
ID Measurement1 Measurement2
2 10 11
4 21 22
Run Code Online (Sandbox Code Playgroud)
我如何使用R结束:
ID Measurement1 Measurement2
1 45 104
2 10 11
3 23 99
4 21 22
...
Run Code Online (Sandbox Code Playgroud)
实际上数据帧是非常大的数据集.
akr*_*run 17
我们可以match用来获取行索引.使用该索引对行进行子集化,我们将第一个数据集的第2列和第3列替换为第二个数据集的相应列.
ind <- match(df2$ID, df1$ID)
df1[ind, 2:3] <- df2[2:3]
df1
# ID Measurement1 Measurement2
#1 1 45 104
#2 2 10 11
#3 3 23 99
#4 4 21 22
Run Code Online (Sandbox Code Playgroud)
或者我们可以使用"ID"列data.table连接数据集on(在将第一个数据集转换为"data.table"ie之后setDT(df1)),并从第二个数据集中分配"Cols"和"iCols".
library(data.table)#v1.9.6+
Cols <- names(df1)[-1]
iCols <- paste0('i.', Cols)
setDT(df1)[df2, (Cols) := mget(iCols), on= 'ID'][]
# ID Measurement1 Measurement2
#1: 1 45 104
#2: 2 10 11
#3: 3 23 99
#4: 4 21 22
Run Code Online (Sandbox Code Playgroud)
df1 <- structure(list(ID = 1:4, Measurement1 = c(45L, 34L, 23L, 56L),
Measurement2 = c(104L, 87L, 99L, 67L)), .Names = c("ID",
"Measurement1", "Measurement2"), class = "data.frame",
row.names = c(NA, -4L))
df2 <- structure(list(ID = c(2L, 4L), Measurement1 = c(10L, 21L),
Measurement2 = c(11L,
22L)), .Names = c("ID", "Measurement1", "Measurement2"),
class = "data.frame", row.names = c(NA, -2L))
Run Code Online (Sandbox Code Playgroud)
library(dplyr)
df1 %>%
anti_join(df2, by = "ID") %>%
bind_rows(df2) %>%
arrange(ID)
Run Code Online (Sandbox Code Playgroud)
dplyr 1.0.0引入了一系列受 SQL 启发的函数来修改行。在这种情况下,您现在可以使用rows_update():
library(dplyr)
df1 %>%
rows_update(df2, by = "ID")
ID Measurement1 Measurement2
1 1 45 104
2 2 10 11
3 3 23 99
4 4 21 22
Run Code Online (Sandbox Code Playgroud)