cre*_*tor 8 r tidyverse tidyselect
我想获得一个通用公式来排列具有不同列数的数据框。
\n例如,在本例中,数据帧包含“categ_1,categ_2,points_1,points_2”:
\n library(tidyverse)\n set.seed(1)\n nrows <- 20\n df <- tibble(\n other_text = sample(letters,\n nrows, replace = TRUE),\n categ_1 = sample(c("A", "B"), nrows, replace = TRUE),\n categ_2 = sample(c("A", "B"), nrows, replace = TRUE),\n points_1 = sample(20:25, nrows, replace = TRUE),\n points_2 = sample(20:25, nrows, replace = TRUE),\n ) %>%\n rowwise() %>%\n mutate(total = sum(c_across(starts_with("points_")))) %>%\n ungroup()\nRun Code Online (Sandbox Code Playgroud)\n以及排列的公式:
\ndf %>%\n arrange(\n desc(total),\n categ_1, categ_2,\n desc(points_1), desc(points_2)\n )\nRun Code Online (Sandbox Code Playgroud)\n但df可以有更多列:“categ_1、categ_2、categ_3、points_1、points_2、points_3”。\n因此,在这种情况下,公式应为:
\ndf %>%\n mutate(\n categ_3 = sample(c("A", "B"), nrows, replace = TRUE),\n points_3 = sample(20:25, nrows, replace = TRUE),\n ) %>%\n rowwise() %>%\n mutate(total = sum(c_across(starts_with("points_")))) %>%\n ungroup() %>%\n arrange(\n desc(total),\n categ_1, categ_2, categ_3,\n desc(points_1), desc(points_2), desc(points_3)\n )\nRun Code Online (Sandbox Code Playgroud)\n我尝试编写一个通用公式(使用across):
library(daff)\n\n daff::diff_data(\n df %>%\n arrange(\n desc(total),\n categ_1, categ_2,\n desc(points_1), desc(points_2)\n )\n ,\n df %>%\n arrange(\n desc(total),\n across(starts_with("categ_")),\n across(starts_with("points_"), desc)\n )\n )\n#> Daff Comparison: \xe2\x80\x98df %>% arrange(desc(total), categ_1, categ_2, desc(points_1), \xe2\x80\x99 \xe2\x80\x98 desc(points_2))\xe2\x80\x99 vs. \xe2\x80\x98df %>% arrange(desc(total), across(starts_with("categ_")), across(starts_with("points_"), \xe2\x80\x99 \xe2\x80\x98 desc))\xe2\x80\x99\n#> A:A B:B ... E:E F:F\n#> @@ other_text categ_1 ... points_2 total\n#> ... ... ... ... ... ...\n#> 10:9 z A ... 23 45\n#> 9:10 : v A ... 22 45\n#> 11:11 s B ... 23 45\n#> ... ... ... ... ... ...\nRun Code Online (Sandbox Code Playgroud)\n这似乎是一个错误arrange:arrange只考虑参数,直到第一个across。
我也尝试在 a 中编写条件,case_when但找不到正确的语法:
# not working\n df %>%\n arrange(\n across(everything(), ~ case_when(\n . == "total" ~ .,\n str_detect(., "categ_") ~ .,\n str_detect(., "points_") ~ desc(.),\n TRUE ~ 1\n )\n )\n )\n#> Error in `arrange()`:\n#> ! Problem with the implicit `transmute()` step.\nRun Code Online (Sandbox Code Playgroud)\n在里面编写该公式的通用方法是什么arrange?\n(欢迎其他替代方案,但我更喜欢 tidyverse 解决方案。)
您可以尝试将所有内容包装在arrange()数据框中。看起来像是arrange()进行了一些代码操作以特殊方式处理顶级desc()调用,这与across(). 但使用数据帧解包功能可以避免这种情况。
library(tidyverse)
set.seed(3)
nrows <- 20
df <- tibble(
other_text = sample(letters, nrows, replace = TRUE),
categ_1 = sample(c("A", "B"), nrows, replace = TRUE),
categ_2 = sample(c("A", "B"), nrows, replace = TRUE),
points_1 = sample(20:25, nrows, replace = TRUE),
points_2 = sample(20:25, nrows, replace = TRUE),
) %>%
rowwise() %>%
mutate(total = sum(c_across(starts_with("points_")))) %>%
ungroup()
identical(
df %>%
arrange(
desc(total),
categ_1, categ_2,
desc(points_1), desc(points_2)
),
df %>%
arrange(
tibble(
desc(total),
across(starts_with("categ_")),
across(starts_with("points_"), desc)
)
)
)
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
安装开发版本:
# remotes::install_github("tidyverse/dplyr")
library(tidyverse)
set.seed(144)
nrows <- 20
df <- tibble(
other_text = sample(letters,
nrows, replace = FALSE),
categ_1 = sample(c("A", "B"), nrows, replace = TRUE),
categ_2 = sample(c("A", "B"), nrows, replace = TRUE),
points_1 = sample(1:25, nrows, replace = FALSE),
points_2 = sample(100:125, nrows, replace = FALSE),
) %>%
rowwise() %>%
mutate(total = sum(c_across(starts_with("points_")))) %>%
ungroup()
out1 <- df %>%
arrange(
desc(total),
categ_1, categ_2,
desc(points_1), desc(points_2)
)
out2 <- df %>%
arrange(
desc(total),
across(starts_with("categ_")),
across(starts_with("points_"), desc)
)
daff::diff_data(out1, out2)
#> Daff Comparison: 'out1' vs. 'out2'
#> other_text categ_1 ...
Run Code Online (Sandbox Code Playgroud)