如何使用dplyr
多个变量创建简单的摘要统计?使用该summarise_each
函数似乎是要走的路,但是,当将多个函数应用于多个列时,结果是一个宽的,难以读取的数据帧.
pal*_*czy 30
使用dplyr
结合tidyr
来重塑最终结果.
library(dplyr)
library(tidyr)
df <- tbl_df(mtcars)
df.sum <- df %>%
select(mpg, cyl, vs, am, gear, carb) %>% # select variables to summarise
summarise_each(funs(min = min,
q25 = quantile(., 0.25),
median = median,
q75 = quantile(., 0.75),
max = max,
mean = mean,
sd = sd))
# the result is a wide data frame
> dim(df.sum)
[1] 1 42
# reshape it using tidyr functions
df.stats.tidy <- df.sum %>% gather(stat, val) %>%
separate(stat, into = c("var", "stat"), sep = "_") %>%
spread(stat, val) %>%
select(var, min, q25, median, q75, max, mean, sd) # reorder columns
> print(df.stats.tidy)
var min q25 median q75 max mean sd
1 am 0.0 0.000 0.0 1.0 1.0 0.40625 0.4989909
2 carb 1.0 2.000 2.0 4.0 8.0 2.81250 1.6152000
3 cyl 4.0 4.000 6.0 8.0 8.0 6.18750 1.7859216
4 gear 3.0 3.000 4.0 4.0 5.0 3.68750 0.7378041
5 mpg 10.4 15.425 19.2 22.8 33.9 20.09062 6.0269481
6 vs 0.0 0.000 0.0 1.0 1.0 0.43750 0.5040161
Run Code Online (Sandbox Code Playgroud)
Kon*_*rad 15
可以使用broom::tidy
和创建一个潜在的简单解决方案purrr::map_df
。broom::tidy
将统计输出中的关键对象汇总到一个小标题中。purrr::map_df
将函数应用于每个元素,在这种情况下是一列并返回一个小标题。
library(tidyverse)
mtcars %>%
select(mpg, cyl, vs, am, gear, carb) %>%
map_df(.f = ~ broom::tidy(summary(.x)), .id = "variable")
Run Code Online (Sandbox Code Playgroud)
# A tibble: 6 x 7
# variable minimum q1 median mean q3 maximum
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 mpg 10.4 15.4 19.2 20.1 22.8 33.9
# 2 cyl 4 4 6 6.19 8 8
# 3 vs 0 0 0 0.438 1 1
# 4 am 0 0 0 0.406 1 1
# 5 gear 3 3 4 3.69 4 5
# 6 carb 1 2 2 2.81 4 8
Run Code Online (Sandbox Code Playgroud)
Mag*_*nus 10
我喜欢paljenczy的想法,即只使用dplyr/tidy并在格式化之前在data.frame/tibble中获取表.但是我遇到了健壮性问题:因为它依赖于解析变量名称,所以它在名称中带有下划线的列上会被阻塞.在尝试在dplyr框架中修复它之后,它似乎总是有些脆弱,因为它依赖于字符串解析.
所以最后我决定使用psych :: describe()这是一个专门为这个东西设计的函数.它不完全是任意函数,但几乎任何人都想要做的事情.完整的示例复制下面的解决方案(将描述与一些整齐的东西结合起来得到我正在寻找的确切元素):
library(psych)
library(tidyverse)
# Create an extended version with a bunch of stats
d.summary.extended <- mtcars %>%
select(mpg, cyl, vs, am, gear, carb) %>%
psych::describe(quant=c(.25,.75)) %>%
as_tibble(rownames="rowname") %>%
print()
<OUTPUT>
# A tibble: 6 x 16
rowname vars n mean sd median trimmed mad min max range skew kurtosis se Q0.25 Q0.75
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mpg 1 32 20.09062 6.0269481 19.2 19.6961538 5.41149 10.4 33.9 23.5 0.6106550 -0.372766 1.06542396 15.425 22.8
2 cyl 2 32 6.18750 1.7859216 6.0 6.2307692 2.96520 4.0 8.0 4.0 -0.1746119 -1.762120 0.31570933 4.000 8.0
3 vs 3 32 0.43750 0.5040161 0.0 0.4230769 0.00000 0.0 1.0 1.0 0.2402577 -2.001938 0.08909831 0.000 1.0
4 am 4 32 0.40625 0.4989909 0.0 0.3846154 0.00000 0.0 1.0 1.0 0.3640159 -1.924741 0.08820997 0.000 1.0
5 gear 5 32 3.68750 0.7378041 4.0 3.6153846 1.48260 3.0 5.0 2.0 0.5288545 -1.069751 0.13042656 3.000 4.0
6 carb 6 32 2.81250 1.6152000 2.0 2.6538462 1.48260 1.0 8.0 7.0 1.0508738 1.257043 0.28552971 2.000 4.0
</OUTPUT>
# Select stats for comparison with other solutions
d.summary <- d.summary.extended %>%
select(var=rowname, min, q25=Q0.25, median, q75=Q0.75, max, mean, sd) %>%
print()
<OUTPUT>
# A tibble: 6 x 8
var min q25 median q75 max mean sd
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mpg 10.4 15.425 19.2 22.8 33.9 20.09062 6.0269481
2 cyl 4.0 4.000 6.0 8.0 8.0 6.18750 1.7859216
3 vs 0.0 0.000 0.0 1.0 1.0 0.43750 0.5040161
4 am 0.0 0.000 0.0 1.0 1.0 0.40625 0.4989909
5 gear 3.0 3.000 4.0 4.0 5.0 3.68750 0.7378041
6 carb 1.0 2.000 2.0 4.0 8.0 2.81250 1.6152000
</OUTPUT>
Run Code Online (Sandbox Code Playgroud)
如果你想创建一个发表的摘要表(不是为了进一步的计算),你可能想看看优秀的观星者包.
df <- data.frame(mtcars)
cols <- c('mpg', 'cyl', 'vs', 'am', 'gear', 'carb')
stargazer(
df[, cols], type = "text",
summary.stat = c("min", "p25", "median", "p75", "max", "median", "sd")
)
================================================================
Statistic Min Pctl(25) Median Pctl(75) Max Median St. Dev.
----------------------------------------------------------------
mpg 10.400 15.430 19.200 22.800 33.900 19.200 6.027
cyl 4 4 6 8 8 6 1.786
vs 0 0 0 1 1 0 0.504
am 0 0 0 1 1 0 0.499
gear 3 3 4 4 5 4 0.738
carb 1 2 2 4 8 2 1.615
----------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)
您也可以将类型更改为"latex"和"html",并将其保存到文件中,并指定给出"out"参数的文件.