内部排列复杂的公式

cre*_*tor 8 r tidyverse tidyselect

我想获得一个通用公式来排列具有不同列数的数据框。

\n

例如,在本例中,数据帧包含“categ_1,categ_2,points_1,points_2”:

\n
  library(tidyverse)\n  set.seed(1)\n  nrows <- 20\n  df <- tibble(\n    other_text = sample(letters,\n                        nrows, replace = TRUE),\n    categ_1 = sample(c("A", "B"), nrows, replace = TRUE),\n    categ_2 = sample(c("A", "B"), nrows, replace = TRUE),\n    points_1 = sample(20:25, nrows, replace = TRUE),\n    points_2 = sample(20:25, nrows, replace = TRUE),\n  ) %>%\n    rowwise() %>%\n    mutate(total = sum(c_across(starts_with("points_")))) %>%\n    ungroup()\n
Run Code Online (Sandbox Code Playgroud)\n

以及排列的公式:

\n
df %>%\n  arrange(\n    desc(total),\n    categ_1, categ_2,\n    desc(points_1), desc(points_2)\n  )\n
Run Code Online (Sandbox Code Playgroud)\n

df可以有更多列:“categ_1、categ_2、categ_3、points_1、points_2、points_3”。\n因此,在这种情况下,公式应为:

\n
df %>%\n  mutate(\n    categ_3 = sample(c("A", "B"), nrows, replace = TRUE),\n    points_3 = sample(20:25, nrows, replace = TRUE),\n  ) %>%\n    rowwise() %>%\n    mutate(total = sum(c_across(starts_with("points_")))) %>%\n    ungroup() %>%\n    arrange(\n      desc(total),\n      categ_1, categ_2, categ_3,\n      desc(points_1), desc(points_2), desc(points_3)\n    )\n
Run Code Online (Sandbox Code Playgroud)\n

我尝试编写一个通用公式(使用across):

\n
  library(daff)\n\n  daff::diff_data(\n    df %>%\n      arrange(\n        desc(total),\n        categ_1, categ_2,\n        desc(points_1), desc(points_2)\n      )\n    ,\n    df %>%\n      arrange(\n        desc(total),\n        across(starts_with("categ_")),\n        across(starts_with("points_"), desc)\n      )\n  )\n#> Daff Comparison: \xe2\x80\x98df %>% arrange(desc(total), categ_1, categ_2, desc(points_1), \xe2\x80\x99 \xe2\x80\x98    desc(points_2))\xe2\x80\x99 vs. \xe2\x80\x98df %>% arrange(desc(total), across(starts_with("categ_")), across(starts_with("points_"), \xe2\x80\x99 \xe2\x80\x98    desc))\xe2\x80\x99\n#>           A:A        B:B     ... E:E      F:F\n#>       @@  other_text categ_1 ... points_2 total\n#>       ... ...        ...     ... ...      ...\n#> 10:9      z          A       ... 23       45\n#> 9:10  :   v          A       ... 22       45\n#> 11:11     s          B       ... 23       45\n#>       ... ...        ...     ... ...      ...\n
Run Code Online (Sandbox Code Playgroud)\n

这似乎是一个错误arrangearrange只考虑参数,直到第一个across

\n

我也尝试在 a 中编写条件,case_when但找不到正确的语法:

\n
  # not working\n  df %>%\n    arrange(\n      across(everything(), ~ case_when(\n        . == "total" ~ .,\n        str_detect(., "categ_") ~ .,\n        str_detect(., "points_") ~ desc(.),\n        TRUE ~ 1\n      )\n      )\n    )\n#> Error in `arrange()`:\n#> ! Problem with the implicit `transmute()` step.\n
Run Code Online (Sandbox Code Playgroud)\n

在里面编写该公式的通用方法是什么arrange?\n(欢迎其他替代方案,但我更喜欢 tidyverse 解决方案。)

\n

Mik*_*ila 5

您可以尝试将所有内容包装在arrange()数据框中。看起来像是arrange()进行了一些代码操作以特殊方式处理顶级desc()调用,这与across(). 但使用数据帧解包功能可以避免这种情况。

library(tidyverse)

set.seed(3)
nrows <- 20

df <- tibble(
  other_text = sample(letters, nrows, replace = TRUE),
  categ_1 = sample(c("A", "B"), nrows, replace = TRUE),
  categ_2 = sample(c("A", "B"), nrows, replace = TRUE),
  points_1 = sample(20:25, nrows, replace = TRUE),
  points_2 = sample(20:25, nrows, replace = TRUE),
) %>%
  rowwise() %>%
  mutate(total = sum(c_across(starts_with("points_")))) %>%
  ungroup()

identical(
  df %>%
    arrange(
      desc(total),
      categ_1, categ_2,
      desc(points_1), desc(points_2)
    ),
  df %>%
    arrange(
      tibble(
        desc(total),
        across(starts_with("categ_")),
        across(starts_with("points_"), desc)
      )
    )
)
#> [1] TRUE
Run Code Online (Sandbox Code Playgroud)


And*_*ter 2

最新的、超级简单的修复

安装开发版本

# remotes::install_github("tidyverse/dplyr")
library(tidyverse)

set.seed(144)
nrows <- 20
df <- tibble(
  other_text = sample(letters,
                      nrows, replace = FALSE),
  categ_1 = sample(c("A", "B"), nrows, replace = TRUE),
  categ_2 = sample(c("A", "B"), nrows, replace = TRUE),
  points_1 = sample(1:25, nrows, replace = FALSE),
  points_2 = sample(100:125, nrows, replace = FALSE),
) %>%
  rowwise() %>%
  mutate(total = sum(c_across(starts_with("points_")))) %>%
  ungroup()

out1 <- df %>%
  arrange(
    desc(total),
    categ_1, categ_2,
    desc(points_1), desc(points_2)
  )

out2 <- df %>%
  arrange(
    desc(total),
    across(starts_with("categ_")),
    across(starts_with("points_"), desc)
  )

daff::diff_data(out1, out2)
#> Daff Comparison: 'out1' vs. 'out2' 
#>      other_text categ_1 ...
Run Code Online (Sandbox Code Playgroud)