我有两个数据框: households和individuals。
这是households:
structure(list(ID = 1:5), class = "data.frame", row.names = c(NA,
-5L))
Run Code Online (Sandbox Code Playgroud)
这是individuals:
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 4L, 4L, 4L, 4L, 5L, 5L), Yesno = c(1L, 0L, 1L, 0L, 0L, 0L,
1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-17L))
Run Code Online (Sandbox Code Playgroud)
我试图添加一个新列来计算变量等于 1households的次数,并按 分组结果。YesnoID
我努力了
households$Count <- as.numeric(ave(individuals$Yesno[individuals$Yesno == 1], households$ID, FUN = count))
Run Code Online (Sandbox Code Playgroud)
households应该看起来像这样:
ID Count
1 2
2 3
3 0
4 2
5 1
Run Code Online (Sandbox Code Playgroud)
使用merge和aggregate
aggregate(Yesno ~ ID, merge(households, individuals), FUN = sum)
# ID Yesno
#1 1 2
#2 2 3
#3 3 0
#4 4 2
#5 5 1
Run Code Online (Sandbox Code Playgroud)
dplyr使用left_join和group_by+summarise
library(dplyr)
left_join(households, individuals) %>%
group_by(ID) %>%
summarise(Count = sum(Yesno))
#Joining, by = "ID"
## A tibble: 5 x 2
# ID Count
# <int> <int>
#1 1 2
#2 2 3
#3 3 0
#4 4 2
#5 5 1
Run Code Online (Sandbox Code Playgroud)
data.tablelibrary(data.table)
setDT(households)
setDT(individuals)
households[individuals, on = "ID"][, .(Count = sum(Yesno)), by = ID]
# ID Count
#1: 1 2
#2: 2 3
#3: 3 0
#4: 4 2
#5: 5 1
Run Code Online (Sandbox Code Playgroud)
households <- structure(list(ID = 1:5), class = "data.frame", row.names = c(NA,
-5L))
individuals <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 4L, 4L, 4L, 4L, 5L, 5L), Yesno = c(1L, 0L, 1L, 0L, 0L, 0L,
1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-17L))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
54 次 |
| 最近记录: |