将tidyr :: spread和dplyr :: summarize结合在一起

CPa*_*Pak 5 r dplyr tidyr

我经常希望执行tidyr::spread并按dplyr::summarise"单步"按组聚合数据.我想要的是显示的expected.我可以expected通过执行summarisespread单独进行并将结果与a相结合,dplyr::full_join但我正在寻找避免full_join的替代方法.不需要真正的单步骤方法.

df <- data.frame(
        id = rep(letters[1], 2),
        val1 = c(10, 20),
        val2 = c(100, 200),
        key = c("A", "B"),
        value = c(1, 2))

library(tidyverse)
result1 <- df %>%
              group_by(id) %>%
              summarise(
                val1 = min(val1),
                val2 = max(val2)
              )
# A tibble: 1 x 3
  # id      val1  val2
  # <fctr> <dbl> <dbl>
# 1 a       10.0   200

result2 <- df %>%
              select(id, key, value) %>%
              group_by(id) %>%
              spread(key, value)
# A tibble: 1 x 3
# Groups: id [1]
  # id         A     B
# * <fctr> <dbl> <dbl>
# 1 a       1.00  2.00

expected <- full_join(result1, result2, by="id")
# A tibble: 1 x 5
  # id      val1  val2     A     B
  # <fctr> <dbl> <dbl> <dbl> <dbl>
# 1 a       10.0   200  1.00  2.00
Run Code Online (Sandbox Code Playgroud)

Cal*_*You 5

我怀疑你的数据可能有需要进行一些修改更边缘的情况下,但你为什么不干脆spread然后summarise?您可以为每个变量分别指定汇总函数,所以AB在那里你实际上并不需要计算什么(我假设),你可以删除所有NA:

df %>%
  spread("key", "value") %>%
  group_by(id) %>%
  summarise(
    val1 = min(val1),
    val2 = max(val2),
    A = mean(A, na.rm = TRUE),
    B = mean(B, na.rm = TRUE)
    )
# A tibble: 1 x 5
  id     val1  val2     A     B
  <fct> <dbl> <dbl> <dbl> <dbl>
1 a      10.0   200  1.00  2.00
Run Code Online (Sandbox Code Playgroud)