dplyr - 多个汇总函数

Dee*_*ena 4 r dplyr

我正在尝试计算数据帧的多个统计信息.

我试过dplyrsummarise_each.但是,结果以平面单行返回,函数名称作为后缀添加.

有没有直接的方法 - 使用dplyr或基础r - 我可以在数据框中获得结果,列作为数据框的列,行作为汇总函数?

library(dplyr)

df = data.frame(A = sample(1:100, 20), 
                B = sample(110:200, 20), 
                C = sample(c(0,1), 20, replace = T))

df %>% summarise_each(funs(min, max)) 
# A_min B_min C_min A_max B_max C_max
# 1    13   117     0    98   188     1

# Desired format
summary(df)
# A               B               C       
# Min.   :13.00   Min.   :117.0   Min.   :0.00  
# 1st Qu.:34.75   1st Qu.:134.2   1st Qu.:0.00  
# Median :45.00   Median :148.0   Median :1.00  
# Mean   :52.35   Mean   :149.9   Mean   :0.65  
# 3rd Qu.:62.50   3rd Qu.:168.8   3rd Qu.:1.00  
# Max.   :98.00   Max.   :188.0   Max.   :1.00  
Run Code Online (Sandbox Code Playgroud)

Axe*_*man 9

怎么样:

library(tidyr)
gather(df) %>% group_by(key) %>% summarise_all(funs(min, max))
Run Code Online (Sandbox Code Playgroud)
# A tibble: 3 × 3
    key   min   max
  <chr> <dbl> <dbl>
1     A     2    92
2     B   111   194
3     C     0     1
Run Code Online (Sandbox Code Playgroud)


Jaa*_*aap 8

为什么不只是简单地使用sapplysummary

sapply(df, summary)
Run Code Online (Sandbox Code Playgroud)

得到:

            A     B    C
Min.     1.00 112.0 0.00
1st Qu. 23.75 134.5 0.00
Median  57.00 148.5 1.00
Mean    50.15 149.9 0.55
3rd Qu. 77.50 167.2 1.00
Max.    94.00 191.0 1.00
Run Code Online (Sandbox Code Playgroud)

要获取数据帧,只需将sapply调用包装在data.frame():data.frame(sapply(df, summary)).如果要在列中保留摘要统计信息,可以rownames(df)使用df$rn <- rownames(df)keep.rownamesdata.table以下参数中使用-parameter 来提取它们:

library(data.table)
dt <- data.table(sapply(df, summary), keep.rownames = TRUE)
Run Code Online (Sandbox Code Playgroud)

这使:

> dt
        rn     A     B   C
1:    Min. 11.00 113.0 0.0
2: 1st Qu. 21.50 126.8 0.0
3:  Median 55.00 138.0 0.5
4:    Mean 53.65 145.2 0.5
5: 3rd Qu. 83.25 160.5 1.0
6:    Max. 98.00 193.0 1.0
Run Code Online (Sandbox Code Playgroud)


cde*_*erv 2

这不是唯一的方法,但您可以根据需要使用dplyr和来重塑 data.frame tidyr。(和/stringr或其他修剪字符。)

library(dplyr)

df = data.frame(A = sample(1:100, 20), 
                B = sample(110:200, 20), 
                C = sample(c(0,1), 20, replace = T))

as_data_frame(summary(df)) %>%
  # some blank character could be trim
  mutate(Var2 = stringr::str_trim(Var2)) %>% 
  # you don't need Var1
  select(-Var1) %>%
  # Get the type of summary and the value
  tidyr::separate(n, c("Type", "value"), sep = ":") %>%
  # Convert value to numeric
  mutate(value = as.numeric(value)) %>%
  # reshape as you wish
  tidyr::spread(Var2, value, drop = T)
#> # A tibble: 6 x 4
#>      Type     A     B     C
#> *   <chr> <dbl> <dbl> <dbl>
#> 1 1st Qu. 36.25 122.2  1.00
#> 2 3rd Qu. 77.25 164.5  1.00
#> 3 Max.    95.00 193.0  1.00
#> 4 Mean    57.30 144.6  0.85
#> 5 Median  63.00 143.5  1.00
#> 6 Min.     8.00 111.0  0.00
Run Code Online (Sandbox Code Playgroud)