R查找数据帧的行,其中某些列与另一列匹配

so1*_*eit 9 r subset dataframe

我有一个R问题,我甚至不确定如何在一个句子中说出来,但却找不到答案.

我有两个数据框,我想"相交",并找到列值在列中匹配的所有行.我已经尝试用&&连接两个intersect()和which()语句,但是它们都没有给我我想要的东西.

这就是我的意思.假设我有两个数据框:

> testData
               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS        1       0      0       0         0            0
2 stack@exchange.com EIFLS0LS        1       0      0       0         0            0
3     data@frame.com EIFLS0LS        1       0      0       0         0            0
4    block@quote.com EIFLS0LS        1       0      0       0         0            0
5          ht@ml.com EIFLS0LS        1       0      0       0         0            0
6     tele@phone.com EIFLS0LS        1       0      0       0         0            0

> testBounced
               Email Campaign
1 stack@overflow.com        1
2 stack@overflow.com        2
3     data@frame.com        2
4    block@quote.com        1
5          ht@ml.com        1
6        lap@top.com        1
Run Code Online (Sandbox Code Playgroud)

如您所见,"电子邮件"列中有一些值相交,而某些值来自与Campaign相交的列.我想要testData中与BOTH列匹配的所有行.

即:

               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS        1       0      0       0         0            0
2    block@quote.com EIFLS0LS        1       0      0       0         0            0
3          ht@ml.com EIFLS0LS        1       0      0       0         0            0
Run Code Online (Sandbox Code Playgroud)

编辑:

我找到这些列的目的是能够更新原始列中的行.所以我想要的最终输出是:

> testData
               Email     Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS        1       1      0       0         0            0
2 stack@exchange.com EIFLS0LS        1       0      0       0         0            0
3     data@frame.com EIFLS0LS        1       0      0       0         0            0
4    block@quote.com EIFLS0LS        1       1      0       0         0            0
5          ht@ml.com EIFLS0LS        1       1      0       0         0            0
6     tele@phone.com EIFLS0LS        1       0      0       0         0            0
Run Code Online (Sandbox Code Playgroud)

如果这是重复的话我很抱歉,并提前感谢您的帮助!

EDIT2 ::

我最后只是使用for循环,没什么了不起,但感觉效率不高.但数据集很小,可以快速完成.如果有人有快速,R风格的方式,我会很高兴看到它!

Señ*_*r O 8

你想要这个功能merge.

merge通常用于将两个表合并为一个类似的common,但该by参数可以允许多个列:

merge(testData, testBounced, by=c("Email", "Campaign"))
Run Code Online (Sandbox Code Playgroud)

默认情况下,将丢弃所有不匹配的对EmailCampaign不匹配的对.这是由参数可控all.xall.y,缺省情况FALSE.

byis 的默认参数intersect(names(x, y)),因此从技术上讲,您不需要在这种情况下指定列,但这有利于清晰.


Ric*_*rta 7

如果您data.tables按照要匹配的列使用和键,那么您可以在一行中完成目标:

    tData[tBounce, Bounced := 1L]
Run Code Online (Sandbox Code Playgroud)



这是完整的过程:

library(data.table)
keys <- c("Email", "Campaign")
tData <- data.table(testData, key=keys)
tBounce <- data.table(testBounce, key=keys)

tData[tBounce, Bounced := 1L]
Run Code Online (Sandbox Code Playgroud)

结果:

tData

                Email   Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1:    block@quote.com EIFLS0LS        1       1      0       0         0            0
2:     data@frame.com EIFLS0LS        1       0      0       0         0            0
3:          ht@ml.com EIFLS0LS        1       1      0       0         0            0
4: stack@exchange.com EIFLS0LS        1       0      0       0         0            0
5: stack@overflow.com EIFLS0LS        1       1      0       0         0            0
6:     tele@phone.com EIFLS0LS        1       0      0       0         0            0
> 
Run Code Online (Sandbox Code Playgroud)