我有一个如下所示的数据框:
df <- data.frame(project = c("A", "B"),
no_dwellings = c(150, 180),
first_occupancy = c(2020, 2019))
project no_dwellings first_occupancy
1 A 150 2020
2 B 180 2019
Run Code Online (Sandbox Code Playgroud)
project是一个标识住宅建筑区域的专栏,no_dwellings表示这些区域最终建造的住宅数量,并且first_occupancy是对第一批居民何时开始搬入新建公寓的估计.
我需要将这些信息纳入人口预测.我们最好的估计是每年(从开始first occupancy),60个住房被搬入.因此,我需要按顺序生成列,这些列结合了来自first_occupancy和的信息,no_dwellings以指示每年可能搬入多少住宅.由于建造的住宅数量不一定除以60,因此剩余部分需要放入相应项目的最后一栏.
这就是我期望我的数据框看起来像进一步处理:
project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1 A 150 2020 0 60 60 30
2 B 180 2019 60 60 60 0
Run Code Online (Sandbox Code Playgroud)
使用data.table-package可以按如下方式处理:
library(data.table)
setDT(df)[, .(yr = first_occupancy:(first_occupancy + no_dwellings %/% 60),
dw = c(rep(60, no_dwellings %/% 60), no_dwellings %% 60))
, by = .(project, no_dwellings, first_occupancy)
][, dcast(.SD, project + no_dwellings + first_occupancy ~ paste0('year_',yr), value.var = 'dw', fill = 0)]
Run Code Online (Sandbox Code Playgroud)
这使:
Run Code Online (Sandbox Code Playgroud)project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022 1: A 150 2020 0 60 60 30 2: B 180 2019 60 60 60 0
同样的逻辑适用于tidyverse:
library(dplyr)
library(tidyr)
df %>%
group_by(project) %>%
do(data.frame(no_dwellings = .$no_dwellings, first_occupancy = .$first_occupancy,
yr = paste0('year_',.$first_occupancy:(.$first_occupancy + .$no_dwellings %/% 60)),
dw = c(rep(60, .$no_dwellings %/% 60), .$no_dwellings %% 60))) %>%
spread(yr, dw, fill = 0)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
67 次 |
| 最近记录: |