将许多字段匹配(并求和)为R中的一个

Dr.*_*rox 2 r

我有一个数据文件(.csv),其中每个观察是333个区之一.每个区都有一个ID,如1101,1102,....... 其次,我有另一个数据文件(.csv),其中每个观察是112,975个城镇之一,包括人口数据.城镇数据有一个district_ID字段.每个区有大约300个城镇.因此,有一个区district_ID == 1101和大约300个城镇district_ID == 1101.

我想在我的分区数据集中创建一个区级人口变量.这意味着将多个城镇观测与每个单一区域观测相匹配,并对城镇级人口进行求和.

谢谢!

Rol*_*and 7

一个data.table解决方案:

#some example data
set.seed(42)
districts <- data.frame(district_ID=1:10,whatever=rnorm(10))
towns <- data.frame(town=1:100,district_ID=rep(1:10,each=10),
                    population=rpois(100,sample(c(1e3,1e4,1e5))))

library(data.table)
districts <- data.table(districts,key="district_ID")
towns <- data.table(towns,key="district_ID")

#calculate district population
temp <- towns[,list(district_pop=sum(population)),by=district_ID]
#merge result with districts data.table
districts <- merge(districts,temp)

#    district_ID    whatever district_pop
# 1:           1  1.37095845       434886
# 2:           2 -0.56469817       334084
# 3:           3  0.36312841       342241
# 4:           4  0.63286260       433224
# 5:           5  0.40426832       334039
# 6:           6 -0.10612452       342810
# 7:           7  1.51152200       433362
# 8:           8 -0.09465904       333810
# 9:           9  2.01842371       342035
# 10:          10 -0.06271410       432302
Run Code Online (Sandbox Code Playgroud)