如何使用字符值连接/合并两个表?

6 r

我想基于名字,姓氏和年份组合两个表,并创建一个新的二进制变量,指示表1中的行是否存在于第二个表中.

第一张桌子是一个赛季NBA球员某些属性的面板数据集:

   firstname<-c("Michael","Michael","Michael","Magic","Magic","Magic","Larry","Larry")
   lastname<-c("Jordan","Jordan","Jordan","Johnson","Johnson","Johnson","Bird","Bird")
   year<-c("1991","1992","1993","1991","1992","1993","1992","1992")

   season<-data.frame(firstname,lastname,year)


    firstname   lastname        year
  1 Michael      Jordan         1991
  2 Michael      Jordan         1992
  3 Michael      Jordan         1993
  4 Magic        Johnson        1991
  5 Magic        Johnson        1992
  6 Magic        Johnson        1993
  7 Larry        Bird           1992
  8 Larry        Bird           1992
Run Code Online (Sandbox Code Playgroud)

第二个data.frame是选择参加全明星赛的NBA球员的一些属性的面板数据集:

   firstname<-c("Michael","Michael","Michael","Magic","Magic","Magic")
   lastname<-c("Jordan","Jordan","Jordan","Johnson","Johnson","Johnson")
   year<-c("1991","1992","1993","1991","1992","1993")

    ALLSTARS<-data.frame(firstname,lastname,year)



     firstname  lastname    year
  1 Michael     Jordan    1991
  2 Michael     Jordan    1992
  3 Michael     Jordan    1993
  4 Magic       Johnson   1991
  5 Magic       Johnson   1992
  6 Magic       Johnson   1993
Run Code Online (Sandbox Code Playgroud)

我想要的结果如下:

  firstname lastname    year    allstars

   1    Michael Jordan  1991    1
   2    Michael Jordan  1992    1
   3    Michael Jordan  1993    1
   4    Magic   Johnson 1991    1
   5    Magic   Johnson 1992    1
   6    Magic   Johnson 1993    1
   7    Larry   Bird    1992    0
   8    Larry   Bird    1992    0
Run Code Online (Sandbox Code Playgroud)

我试图使用左连接.但不确定这是否有意义:

    test<-join(season, ALLSTARS, by =c("lastname","firstname","year") , type = "left", match = "all")
Run Code Online (Sandbox Code Playgroud)

Sam*_*rke 1

看来您正在使用join()plyr 包中的内容。你就快到了:只需在你的命令前面加上ALLSTARS$allstars <- 1. 然后按照编写的方式进行连接,最后将NA值转换为 0。所以:

ALLSTARS$allstars <- 1
test <- join(season, ALLSTARS, by =c("lastname","firstname","year") , type = "left", match = "all")
test$allstars[is.na(test$allstars)] <- 0
Run Code Online (Sandbox Code Playgroud)

结果:

  firstname lastname year allstars
1   Michael   Jordan 1991        1
2   Michael   Jordan 1992        1
3   Michael   Jordan 1993        1
4     Magic  Johnson 1991        1
5     Magic  Johnson 1992        1
6     Magic  Johnson 1993        1
7     Larry     Bird 1992        0
8     Larry     Bird 1992        0
Run Code Online (Sandbox Code Playgroud)

尽管我个人会使用dplyr 包中的left_joinright_join,如 David 的回答,而不是 plyr 的join(). 另请注意,在这种情况下,您实际上不需要by的参数join(),因为默认情况下该函数将尝试连接具有通用名称的所有字段,这正是您想要的。