y.a*_*y.a 6 r data-manipulation dataframe dplyr
我有一个与此类似的数据集:
dataset <- structure(
list(
Participant.Id = 1:5,
x1 = c(10L, 20L, 30L, 40L, 50L),
x2 = c(15L, 25L, 35L, 45L, 55L),
x3 = c(20L, 25L, NA, 45L, NA),
x4 = c(25L, 30L, NA, 50L, NA),
x5 = c(NA, 35L, NA, 55L, NA),
x6 = c(NA, 35L, NA, NA, NA),
y1 = c(10L, 20L, 30L, 40L, 50L),
y2 = c(15L, 25L, 35L, 45L, 55L),
y3 = c(20L, 25L, NA, 45L, NA),
y4 = c(25L, 30L, NA, 50L, NA),
y5 = c(NA, 35L, NA, 55L, NA),
y6 = c(NA, 35L, NA, NA, NA),
z1 = c(10L, 20L, 30L, 40L, 50L),
z2 = c(15L, 25L, 35L, 45L, 55L),
z3 = c(20L, 25L, NA, 45L, NA),
z4 = c(25L, 30L, NA, 50L, NA),
z5 = c(NA, 35L, NA, 55L, NA),
z6 = c(NA, 35L, NA, NA, NA),
mt1_oranges_vol = c(100L, 200L, 300L, 400L, 500L),
mt2_oranges_vol = c(110L, 210L, 310L, 410L, 510L),
mt3_oranges_vol = c(120L, 220L, NA, 420L, 520L),
mt4_oranges_vol = c(130L, 230L, NA, 430L, NA),
mt5_oranges_vol = c(NA, 240L, NA, NA, NA),
mt6_oranges_vol = c(NA, NA, NA, NA, NA),
mt1_pears_vol = c(101L, 201L, 301L, 401L, 501L),
mt2_pears_vol = c(111L, 211L, 311L, 411L, 511L),
mt3_pears_vol = c(121L, 221L, NA, 421L, 521L),
mt4_pears_vol = c(131L, 231L, NA, 431L, NA),
mt5_pears_vol = c(NA, 241L, NA, NA, NA),
mt6_pears_vol = c(NA, NA, NA, NA, NA),
mt1_apples_vol = c(102L, 202L, 302L, 402L, 502L),
mt2_apples_vol = c(112L, 212L, 312L, 412L, 512L),
mt3_apples_vol = c(122L, 222L, NA, 422L, 522L),
mt4_apples_vol = c(132L, 232L, NA, 432L, NA),
mt5_apples_vol = c(NA, 242L, NA, NA, NA),
mt6_apples_vol = c(NA, NA, NA, NA, NA)),
class = "data.frame",
row.names = c(NA, -5L)
)
Run Code Online (Sandbox Code Playgroud)
我需要制作一个总计列,即 mt1_apples_vol + mt1_pears_vol + mt1_oranges_vol 的总和;mt2_apples_vol + mt2_pears_vol + mt2_oranges_vol 等
目前,我计算为:
dataset <- dataset %>%
mutate(ct1_total_vol = rowSums(select(., starts_with("mt1_")), na.rm = F),
mt2_total_vol = rowSums(select(., starts_with("mt2_")), na.rm = F),
mt3_total_vol = rowSums(select(., starts_with("mt3_")), na.rm = F)
)
Run Code Online (Sandbox Code Playgroud)
然而,将来可能会添加更多测量。因此我希望它迭代 mt_range:
mt_range <- 1:6
Run Code Online (Sandbox Code Playgroud)
我无法以创建新列并根据 mt_range 选择所有变量的方式编写代码
如果您将数据放在长格式中,您会发现这会容易得多。我们可以使用names_pattern参数将名称tidyr::pivot_longer()后面的数字和单词分成两列,和。"mt_""mt1_apples_vol""fruit_num""fruit"
那么就只是按水果数量分组并计算总和的情况。
\ndataset |>\n tidyr::pivot_longer(\n cols = starts_with("mt"),\n names_pattern = "^mt(\\\\d)_(\\\\w+)_vol$",\n names_to = c("fruit_num", "fruit")\n ) |>\n summarise(\n total_vol = sum(value, na.rm = FALSE),\n .by = c(Participant.Id, fruit_num)\n )\n\n# # A tibble: 30 \xc3\x97 3\n# Participant.Id fruit_num total_vol\n# <int> <chr> <int>\n# 1 1 1 303\n# 2 1 2 333\n# 3 1 3 363\n# 4 1 4 393\n# 5 1 5 NA\n# 6 1 6 NA\n# 7 2 1 603\n# 8 2 2 633\n# 9 2 3 663\n# 10 2 4 693\n# # \xe2\x84\xb9 20 more rows\nRun Code Online (Sandbox Code Playgroud)\n如果你想要它的宽形式,你可以将上面的内容通过管道返回到tidyr::pivot_wider().
您指定的确切原始格式的数据(我认为这不太可能是最佳格式)可以通过转回宽格式并使用来获得dplyr::inner_join()
dataset |>\n inner_join(\n dataset |>\n tidyr::pivot_longer(\n cols = starts_with("mt"),\n names_pattern = "^mt(\\\\d)_(\\\\w+)_vol$",\n names_to = c("fruit_num", "fruit")\n ) |>\n summarise(\n total_vol = sum(value, na.rm = FALSE),\n .by = c(Participant.Id, fruit_num)\n ) |>\n tidyr::pivot_wider(\n names_from = fruit_num,\n values_from = total_vol,\n names_glue = "mt{.name}_total_vol"\n ),\n by = "Participant.Id"\n )\nRun Code Online (Sandbox Code Playgroud)\n