如何匹配R中具有相同主键的两个表中的数据

jef*_*rey 5 r dataframe

我有两张关于人的数据表:

df1 <- data.frame(id=c(113,202,377,288,359),
                  name=c("Alex","Silvia","Peter","Jack","Jonny"))
Run Code Online (Sandbox Code Playgroud)

这为我提供了

   id   name
1 113   Alex
2 202 Silvia
3 377  Peter
4 288   Jack
5 359  Jonny
Run Code Online (Sandbox Code Playgroud)

我有第二张表,其中包含其家庭成员的姓名:

df2 <- data.frame(id=c(113,113,113,202,202,359,359,359,359),
                 family.members=c("Ross","Jefferson","Max","Jo","Michael","Jimmy","Rex","Bill","Larry"))
Run Code Online (Sandbox Code Playgroud)

这为我提供了:

> df2
   id family.members
1 113           Ross
2 113      Jefferson
3 113            Max
4 202             Jo
5 202        Michael
6 359          Jimmy
7 359            Rex
8 359           Bill
9 359          Larry
Run Code Online (Sandbox Code Playgroud)

现在我想扩展表1,其中包含每个人的家庭成员总数的附加列:

    id   name no.family.memebers
1  113   Alex                  3
2  202 Silvia                  2
3  377  Peter                  0
4  288   Jack                  0
5  359  Jonny                  4
Run Code Online (Sandbox Code Playgroud)

在R中创建第三个表的最佳方法是什么?

非常感谢你提前!

Gre*_*gor 8

运用 dplyr

library(dplyr)
df1 <- df1 %>% left_join((
    df2 %>% group_by(id) %>%
    summarize(no.family.members = n())
    )
)
Run Code Online (Sandbox Code Playgroud)

dplyr> = 0.3.0.2时,它可以被重写为

df3 <- df1 %>% left_join(df2 %>% count(id))
Run Code Online (Sandbox Code Playgroud)

  • 使用最新的dplyr版本,您可以将其简化为`df1%>%left_join(df2%>%count(id))`(+ 1) (4认同)
  • @TylerRinker,我已将它添加到Gregor的答案中,希望他不介意:)(否则,只需删除回滚,Gregor) (2认同)

42-*_*42- 5

 df1 <- df1[order(df1$id), ]  # Just to be safe
 # the counts vector will be ordered by df2$id
 counts <- with (df2, tapply(family.members, id, length))
 df1$no.family.members[df1$id %in% names(counts)]<- counts
 df1
   id   name no.family.members
1 113   Alex                 3
2 202 Silvia                 2
4 288   Jack                NA
5 359  Jonny                 4
3 377  Peter                NA
Run Code Online (Sandbox Code Playgroud)

(我认为NA比0更具信息量.)