使用 R dplyr mutate 创建多个列，而不是使用循环？

Question

使用 R dplyr mutate 创建多个列，而不是使用循环？

Sta*_*ent 1 r calculated-columns rowsum dplyr mutate

我正在尝试使用 R 的dplyr包为数据集中的每年创建多个新列，即与每年季度末数字（三月、六月、九月、十二月）对应的列的总和。我能够弄清楚如何“有效”地做到这一点的唯一方法是使用 for 循环。但有些事情告诉我，有一种替代的、更有效的或更好的方法来解决这个问题（也许我应该在这里使用地图函数，但我只是不确定？）。这是一个可以复制的玩具示例：

\n

library(tidyverse)\nlibrary(glue)\n\n# Create a toy example and print the resulting tibble\nset.seed(100) # make results reproducible by setting seed\nvars <- c("AgeGroup", paste0(month.abb[seq(3, 12, 3)], "_", rep(15:17, each = 4)))\n\n(df <- cbind(LETTERS[1:5], matrix(rpois(n = (length(vars) - 1) * 5, 30), nrow = 5)) %>% \n    data.frame() %>%\n    setNames(vars) %>% \n    tibble() %>% \n    mutate(across(-1, as.integer))\n  )\n

Run Code Online (Sandbox Code Playgroud)\n

它将示例/可重现的数据集设置为：

\n

# A tibble: 5 \xc3\x97 13\n  AgeGroup Mar_15 Jun_15 Sep_15 Dec_15 Mar_16 Jun_16 Sep_16 Dec_16 Mar_17 Jun_17 Sep_17 Dec_17\n  <chr>     <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>\n1 A            27     26     33     36     34     25     27     37     37     32     37     30\n2 B            21     32     24     31     25     39     32     20     30     32     25     26\n3 C            34     28     30     23     25     29     35     26     19     30     28     29\n4 D            30     32     29     34     31     29     35     37     28     34     31     50\n5 E            31     33     27     31     23     26     29     28     28     26     19     37\n

Run Code Online (Sandbox Code Playgroud)\n

所以我想做的是为每一年（'15、'16 和 '17）创建一个名为 , 的新变量sum_15，sum_16它们sum_17是以相应两位数年份结尾的变量中所有月份值的总和（例如，，，ends_with("15")）。ends_with("16")ends_with("17")

\n

我已经能够使用以下代码实现所需的结果，但如果我可以明智地应用语句across或可能的话，我宁愿不使用循环map函数（或你们可能建议的其他方法），我宁愿不使用循环：

\n

# This works, but I'd rather not use a for loop if I can avoid it:\nfor (i in 15:17) {\n  df <- df %>% mutate("sum_{i}" := rowSums(across(ends_with(glue("_{i}")))))\n}\n\n#write out the df that displays what I am trying to achieve\ndf %>% select(AgeGroup, starts_with("sum"))\n\n# A tibble: 5 \xc3\x97 4\n  AgeGroup sum_15 sum_16 sum_17\n  <chr>     <dbl>  <dbl>  <dbl>\n1 A           122    123    136\n2 B           108    116    113\n3 C           115    115    106\n4 D           125    132    143\n5 E           122    106    110\n

Run Code Online (Sandbox Code Playgroud)\n

我查看了 SO 上的其他示例，但我发现的所有示例都过于简单化，并且似乎通过在 mutate 语句\xe2\x80\x94something 中手动创建它们，一次仅创建一个变量，如下所示：

\n

df %>% mutate(sum15 = rowSums(across(ends_with("_15"))),\n              sum16 = rowSums(across(ends_with("_16"))),\n              sum17 = rowSums(across(ends_with("_17"))),\n              )\n

Run Code Online (Sandbox Code Playgroud)\n

这显然不是我想要的，因为这基本上是一种更手动的方式来完成我已经使用 for 循环所做的事情。

\n

任何人都可以提供有关如何改进此代码并避免 for 循环的任何建议吗？

\n

太感谢了！

\n

Answer 1

Ony*_*mbu 5

另一种方法是：

\n

df %>%\n   pivot_longer(-AgeGroup, names_pattern = "(\\\\d+)")%>%\n   pivot_wider(values_fn = sum, names_prefix = 'Sum_')\n\n# A tibble: 5 \xc3\x97 4\n  AgeGroup Sum_15 Sum_16 Sum_17\n  <chr>     <int>  <int>  <int>\n1 A           122    123    136\n2 B           108    116    113\n3 C           115    115    106\n4 D           125    132    143\n5 E           122    106    110\n

Run Code Online (Sandbox Code Playgroud)\n

然后就可以加入原来的df

\n

如果您不知道names_pattern可以使用names_sep：

\n

df %>%\n   pivot_longer(-AgeGroup, names_to = c(NA, 'name'), names_sep = "_")%>%\n   pivot_wider(values_fn = sum, names_prefix = 'Sum_')\n# A tibble: 5 \xc3\x97 4\n  AgeGroup Sum_15 Sum_16 Sum_17\n  <chr>     <int>  <int>  <int>\n1 A           122    123    136\n2 B           108    116    113\n3 C           115    115    106\n4 D           125    132    143\n5 E           122    106    110\n

Run Code Online (Sandbox Code Playgroud)\n

\n

在 R 基础上你可以这样做：

\n

sapply(split.default(df[-1], sub(".*_", "Sum_", names(df)[-1])), rowSums)\n     Sum_15 Sum_16 Sum_17\n[1,]    122    123    136\n[2,]    108    116    113\n[3,]    115    115    106\n[4,]    125    132    143\n[5,]    122    106    110\n

Run Code Online (Sandbox Code Playgroud)\n

您可以将其绑定到原始数据框，即

\n

cbind(df, sapply(split.default(df[-1], sub(".*_", "Sum_", names(df)[-1])), rowSums))\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	2 年，6 月前
查看次数：	172 次
最近记录：	2 年，6 月前