我有一个包含 n 个变量的数据集(示例中为 3 个,但有所不同),我想找到这些变量彼此之间的相对频率。它们总是被命名为相同的,前缀后跟数字序列。
have <-
data.frame(
x1 = sample(1:10, 20, replace = TRUE),
x2 = sample(1:10, 20, replace = TRUE),
x3 = sample(1:10, 20, replace = TRUE)
)
want <-
have |>
mutate(
x1_prop = x1 / (x1 + x2 + x3),
x2_prop = x2 / (x1 + x2 + x3),
x3_prop = x3 / (x1 + x2 + x3))
Run Code Online (Sandbox Code Playgroud)
我认为 dplyr 中的解决方案可以使用mutate(across,但无法弄清楚语法......
want <-
have |>
mutate(across(everything()), . / rowSums(.)) # does not work
Run Code Online (Sandbox Code Playgroud)
几件事:
~(或者您可以使用\(x)或function)pick(everything())在调用内部使用across表示您想要获取所有列的总和。如果您仅使用., 或.x,您将仅获得当前列的总和。have |>
mutate(across(everything(), ~ .x / rowSums(pick(everything())),
.names = "{col}_prop"))
# x1 x2 x3 x1_prop x2_prop x3_prop
# 1 9 5 2 0.5625000 0.3125000 0.12500000
# 2 6 4 6 0.3750000 0.2500000 0.37500000
# 3 9 6 1 0.5625000 0.3750000 0.06250000
# 4 8 7 8 0.3478261 0.3043478 0.34782609
# 5 1 8 7 0.0625000 0.5000000 0.43750000
# 6 9 3 9 0.4285714 0.1428571 0.42857143
# 7 7 4 8 0.3684211 0.2105263 0.42105263
# 8 3 5 5 0.2307692 0.3846154 0.38461538
# 9 10 8 10 0.3571429 0.2857143 0.35714286
# 10 5 2 6 0.3846154 0.1538462 0.46153846
# 11 3 10 1 0.2142857 0.7142857 0.07142857
# 12 10 8 2 0.5000000 0.4000000 0.10000000
# 13 3 7 6 0.1875000 0.4375000 0.37500000
# 14 4 4 10 0.2222222 0.2222222 0.55555556
# 15 2 7 4 0.1538462 0.5384615 0.30769231
# 16 2 3 2 0.2857143 0.4285714 0.28571429
# 17 3 9 1 0.2307692 0.6923077 0.07692308
# 18 8 6 2 0.5000000 0.3750000 0.12500000
# 19 2 2 8 0.1666667 0.1666667 0.66666667
# 20 7 4 3 0.5000000 0.2857143 0.21428571
Run Code Online (Sandbox Code Playgroud)