根据预定义范围计算列的行和

y.a*_*y.a 6 r data-manipulation dataframe dplyr

我有一个与此类似的数据集:

  dataset <- structure(
list(
Participant.Id = 1:5,

x1 = c(10L, 20L, 30L, 40L, 50L),
x2 = c(15L, 25L, 35L, 45L, 55L),
x3 = c(20L, 25L, NA, 45L, NA),
x4 = c(25L, 30L, NA, 50L, NA),
x5 = c(NA, 35L, NA, 55L, NA),
x6 = c(NA, 35L, NA, NA, NA),

y1 = c(10L, 20L, 30L, 40L, 50L),
y2 = c(15L, 25L, 35L, 45L, 55L),
y3 = c(20L, 25L, NA, 45L, NA),
y4 = c(25L, 30L, NA, 50L, NA),
y5 = c(NA, 35L, NA, 55L, NA),
y6 = c(NA, 35L, NA, NA, NA),

z1 = c(10L, 20L, 30L, 40L, 50L),
z2 = c(15L, 25L, 35L, 45L, 55L),
z3 = c(20L, 25L, NA, 45L, NA),
z4 = c(25L, 30L, NA, 50L, NA),
z5 = c(NA, 35L, NA, 55L, NA),
z6 = c(NA, 35L, NA, NA, NA),

mt1_oranges_vol = c(100L, 200L, 300L, 400L, 500L),
mt2_oranges_vol = c(110L, 210L, 310L, 410L, 510L),
mt3_oranges_vol = c(120L, 220L, NA, 420L, 520L),
mt4_oranges_vol = c(130L, 230L, NA, 430L, NA),
mt5_oranges_vol = c(NA, 240L, NA, NA, NA),
mt6_oranges_vol = c(NA, NA, NA, NA, NA),
 
mt1_pears_vol = c(101L, 201L, 301L, 401L, 501L),
mt2_pears_vol = c(111L, 211L, 311L, 411L, 511L),
mt3_pears_vol = c(121L, 221L, NA, 421L, 521L),
mt4_pears_vol = c(131L, 231L, NA, 431L, NA),
mt5_pears_vol = c(NA, 241L, NA, NA, NA),
mt6_pears_vol = c(NA, NA, NA, NA, NA),

mt1_apples_vol = c(102L, 202L, 302L, 402L, 502L),
mt2_apples_vol = c(112L, 212L, 312L, 412L, 512L),
mt3_apples_vol = c(122L, 222L, NA, 422L, 522L),
mt4_apples_vol = c(132L, 232L, NA, 432L, NA),
mt5_apples_vol = c(NA, 242L, NA, NA, NA),
mt6_apples_vol = c(NA, NA, NA, NA, NA)),


class = "data.frame", 
row.names = c(NA, -5L)
)
Run Code Online (Sandbox Code Playgroud)

我需要制作一个总计列,即 mt1_apples_vol + mt1_pears_vol + mt1_oranges_vol 的总和;mt2_apples_vol + mt2_pears_vol + mt2_oranges_vol 等

目前,我计算为:

dataset <- dataset  %>%
mutate(ct1_total_vol = rowSums(select(., starts_with("mt1_")), na.rm = F),
   mt2_total_vol = rowSums(select(., starts_with("mt2_")), na.rm = F),
   mt3_total_vol = rowSums(select(., starts_with("mt3_")), na.rm = F)
   )
Run Code Online (Sandbox Code Playgroud)

然而,将来可能会添加更多测量。因此我希望它迭代 mt_range:

mt_range <- 1:6
Run Code Online (Sandbox Code Playgroud)

我无法以创建新列并根据 mt_range 选择所有变量的方式编写代码

Sam*_*amR 3

如果您将数据放在长格式中,您会发现这会容易得多。我们可以使用names_pattern参数将名称tidyr::pivot_longer()后面的数字和单词分成两列,和。"mt_""mt1_apples_vol""fruit_num""fruit"

\n

那么就只是按水果数量分组并计算总和的情况。

\n
dataset |>\n    tidyr::pivot_longer(\n        cols = starts_with("mt"),\n        names_pattern = "^mt(\\\\d)_(\\\\w+)_vol$",\n        names_to = c("fruit_num", "fruit")\n    ) |>\n    summarise(\n        total_vol = sum(value, na.rm = FALSE),\n        .by = c(Participant.Id, fruit_num)\n    )\n\n# # A tibble: 30 \xc3\x97 3\n#    Participant.Id fruit_num total_vol\n#             <int> <chr>         <int>\n#  1              1 1               303\n#  2              1 2               333\n#  3              1 3               363\n#  4              1 4               393\n#  5              1 5                NA\n#  6              1 6                NA\n#  7              2 1               603\n#  8              2 2               633\n#  9              2 3               663\n# 10              2 4               693\n# # \xe2\x84\xb9 20 more rows\n
Run Code Online (Sandbox Code Playgroud)\n

如果你想要它的宽形式,你可以将上面的内容通过管道返回到tidyr::pivot_wider().

\n

您指定的确切原始格式的数据(我认为这不太可能是最佳格式)可以通过转回宽格式并使用来获得dplyr::inner_join()

\n
dataset |>\n    inner_join(\n        dataset |>\n            tidyr::pivot_longer(\n                cols = starts_with("mt"),\n                names_pattern = "^mt(\\\\d)_(\\\\w+)_vol$",\n                names_to = c("fruit_num", "fruit")\n            ) |>\n            summarise(\n                total_vol = sum(value, na.rm = FALSE),\n                .by = c(Participant.Id, fruit_num)\n            ) |>\n            tidyr::pivot_wider(\n                names_from = fruit_num,\n                values_from = total_vol,\n                names_glue = "mt{.name}_total_vol"\n            ),\n        by = "Participant.Id"\n    )\n
Run Code Online (Sandbox Code Playgroud)\n