创建具有来自单独表行的列特征的数据框

Tur*_*tle 3 r dplyr purrr tidyverse

我有一个描述性的辅助表,其行指定变量的特征,其中varCat描述变量类别、rept所述类别的后续实现重复次数及其form数据格式:

\n
require(dplyr)\nrequire(tidyr)\nrequire(purrr)\n\ndescr <- tibble(\n  varCat = c("a", "b"),\n  rept = c(1, 3),\n  form = c("text", "num")\n)\ndescr\n#> # A tibble: 2 \xc3\x97 3\n#>   varCat  rept form \n#>   <chr>  <dbl> <chr>\n#> 1 a          1 text \n#> 2 b          3 num\n
Run Code Online (Sandbox Code Playgroud)\n

我想要获得的是以下(空)数据框:

\n
d\n#> # A tibble: 0 \xc3\x97 4\n#> # \xe2\x80\xa6 with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>\n
Run Code Online (Sandbox Code Playgroud)\n

创建于 2022-09-27,使用reprex v2.0.2

\n

涉及两个步骤:

\n
    \n
  1. 辅助表varrept一起在“目标”数据框中建立列名称,如果rept等于 1,则不应应用后缀;但如果rept大于 1,则应创建带有后缀的列序列;
  2. \n
  3. 应读出每列的格式descr$form
  4. \n
\n

我已经成功地实现了这些步骤,尽管我感觉很笨拙:

\n
# Step 1:\ntmp <- descr %>%\n  uncount(rept, .id = "rept") %>%\n  group_by(varCat) %>%\n  mutate(\n    n = n(),\n    var = case_when(\n      n > 1 ~ paste0(varCat, "_", rept),\n      TRUE ~ varCat\n    )\n  ) %>%\n  ungroup %>%\n  select(var, form)\nc <- tmp$var\nd <- matrix(ncol = length(c), nrow = 0) %>%\n  as_tibble(.name_repair = "unique") %>%\n  set_names(c)\n\n# Step 2:\nfor (i in colnames(d)) {\n  for (j in seq_along(tmp$var)) {\n    if (tmp$var[j] == i & tmp$form[j] == "text") d[i] <- as.character(d[i]) else\n    if (tmp$var[j] == i & tmp$form[j] == "num") d[i] <- as.numeric(d[i])\n  }\n}\nd\n#> # A tibble: 0 \xc3\x97 4\n#> # \xe2\x80\xa6 with 4 variables: a <chr>, b_1 <dbl>, b_2 <dbl>, b_3 <dbl>\n
Run Code Online (Sandbox Code Playgroud)\n

创建于 2022-09-27,使用reprex v2.0.2

\n

我确信必须有一种更简洁的方法来实现这一目标。任何帮助将非常感激。

\n

zx8*_*754 8

将mapply与返回列表的自定义函数结合使用,然后使用 call data.frame将列表转换为data.frame

foo <- function(varCat, rept, form){
  f <- setNames(c("character", "numeric"), c("text", "num"))[ form ]
  x <- rep(list(vector(mode = f)), rept)
  x <- setNames(x, rep(varCat, rept))
  if(rept > 1) names(x) <- paste(names(x), seq(names(x)), sep = "_")
  x
}

out <- data.frame(mapply(foo, descr$varCat, descr$rept, descr$form,
                         USE.NAMES = FALSE))

#check the output
out
# [1] a   b_1 b_2 b_3
# <0 rows> (or 0-length row.names)
str(out)
# 'data.frame': 0 obs. of  4 variables:
# $ a  : chr 
# $ b_1: num 
# $ b_2: num 
# $ b_3: num 
Run Code Online (Sandbox Code Playgroud)