基于多个现有列顺序生成列

tif*_*ifu 4 r dplyr

我有一个如下所示的数据框:

 df <- data.frame(project = c("A", "B"),
                  no_dwellings = c(150, 180),
                  first_occupancy = c(2020, 2019))

  project no_dwellings first_occupancy
1       A          150            2020
2       B          180            2019
Run Code Online (Sandbox Code Playgroud)

project是一个标识住宅建筑区域的专栏,no_dwellings表示这些区域最终建造的住宅数量,并且first_occupancy是对第一批居民何时开始搬入新建公寓的估计.

我需要将这些信息纳入人口预测.我们最好的估计是每年(从开始first occupancy),60个住房被搬入.因此,我需要按顺序生成列,这些列结合了来自first_occupancy和的信息,no_dwellings以指示每年可能搬入多少住宅.由于建造的住宅数量不一定除以60,因此剩余部分需要放入相应项目的最后一栏.

这就是我期望我的数据框看起来像进一步处理:

  project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1       A          150            2020         0        60        60        30
2       B          180            2019        60        60        60         0
Run Code Online (Sandbox Code Playgroud)

Jaa*_*aap 5

使用data.table-package可以按如下方式处理:

library(data.table)

setDT(df)[, .(yr = first_occupancy:(first_occupancy + no_dwellings %/% 60),
              dw = c(rep(60, no_dwellings %/% 60), no_dwellings %% 60))
          , by = .(project, no_dwellings, first_occupancy)
          ][, dcast(.SD, project + no_dwellings + first_occupancy ~ paste0('year_',yr), value.var = 'dw', fill = 0)]
Run Code Online (Sandbox Code Playgroud)

这使:

   project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1:       A          150            2020         0        60        60        30
2:       B          180            2019        60        60        60         0
Run Code Online (Sandbox Code Playgroud)

同样的逻辑适用于tidyverse:

library(dplyr)
library(tidyr)

df %>% 
  group_by(project) %>% 
  do(data.frame(no_dwellings = .$no_dwellings, first_occupancy = .$first_occupancy,
                yr = paste0('year_',.$first_occupancy:(.$first_occupancy + .$no_dwellings %/% 60)),
                dw = c(rep(60, .$no_dwellings %/% 60), .$no_dwellings %% 60))) %>% 
  spread(yr, dw, fill = 0)
Run Code Online (Sandbox Code Playgroud)