根据其他数据框替换特定值

Ale*_*xis 10 lookup r dataframe

首先,让我们从DataFrame 1(DF1)开始:

DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016", 
                    "06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
                    "06/22/2016", "06/23/2016"),
                  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                  c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
                  c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
                  c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY", 
                    "NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")
Run Code Online (Sandbox Code Playgroud)

我也有DataFrame 2(DF2):

DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
                  c(1, 1, 2, 2),
                  c(9999, 8888, 777, 555),
                  c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")
Run Code Online (Sandbox Code Playgroud)

对于DF1中的每一行,我必须查看DF2中是否有一行具有相同的日期和ID.如果是,我必须用DF2中的值替换DF1中的值.

DF2的列总是比DF1少.如果列不在DF2中,我必须保留该特定列的DF1中的原始值.

最终输出是这样的:

results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
                        "06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
                        "06/22/2016", "06/23/2016"),
                      c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                      c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
                      c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
                      c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY", 
                        "NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")
Run Code Online (Sandbox Code Playgroud)

你有什么建议吗?

Jaa*_*aap 21

您可以使用的join功能:

library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
Run Code Online (Sandbox Code Playgroud)

这使:

> DF1
          date id sales cost city
 1: 06/19/2016  1  9999  101  LON
 2: 06/20/2016  1   150  102  MTL
 3: 06/21/2016  1   151  104  MTL
 4: 06/22/2016  1   152  107  MTL
 5: 06/23/2016  1   155   99  MTL
 6: 06/19/2016  2    84   55   NY
 7: 06/20/2016  2    83   55   NY
 8: 06/21/2016  2    80   56   NY
 9: 06/22/2016  2   777   57   QC
10: 06/23/2016  2   555   58   QC
Run Code Online (Sandbox Code Playgroud)

如果两个数据集中有许多列,则更容易使用mget而不是键入所有列名称.对于问题中使用的数据,它看起来像:

DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
Run Code Online (Sandbox Code Playgroud)