以直方图说明标准差

1 r histogram ggplot2

考虑以下简单示例:

# E. Musk in Grunheide 
set.seed(22032022) 

# generate random numbers 
randomNumbers <- rnorm(n = 1000, mean = 10, sd = 10)

# empirical sd 
sd(randomNumbers)
#> [1] 10.34369

# histogram 
hist(randomNumbers, probability = TRUE, main = "", breaks = 50)

# just for illusatration purpose 
###
# empirical density 
lines(density(randomNumbers), col = 'black', lwd = 2)
# theortical density 
curve(dnorm(x, mean = 10, sd = 10), col = "blue", lwd = 2, add = TRUE)
###
Run Code Online (Sandbox Code Playgroud)

由reprex 包(v2.0.1)于 2022-03-22 创建

问题: 有没有一种好方法可以通过颜色来说明直方图中的经验标准差 (sd)?例如,用不同的颜色表示内部条形,或者在 x 轴上用间隔表示 sd 的范围,即 [平均值 +/- sd]?

请注意,如果ggplot2提供一个简单的解决方案,建议这也将不胜感激。

All*_*ron 5

这与 Benson 的答案类似ggplot,除了我们预先计算直方图并使用geom_col,这样我们就不会在 sd 边界处得到任何不受欢迎的堆叠:

# E. Musk in Grunheide 
set.seed(22032022) 

# generate random numbers 
randomNumbers <- rnorm(n=1000, mean=10, sd=10)

h <- hist(randomNumbers, breaks = 50, plot = FALSE)

lower <- mean(randomNumbers) - sd(randomNumbers)
upper <- mean(randomNumbers) + sd(randomNumbers)

df <- data.frame(x = h$mids, y = h$density, 
                 fill = h$mids > lower & h$mids < upper)

library(ggplot2)

ggplot(df) +
  geom_col(aes(x, y, fill = fill), width = 1, color = 'black') +
  geom_density(data = data.frame(x = randomNumbers), 
               aes(x = x, color = 'Actual density'),
               key_glyph = 'path') +
  geom_function(fun = function(x) {
    dnorm(x, mean = mean(randomNumbers), sd = sd(randomNumbers)) },
    aes(color = 'theoretical density')) +
  scale_fill_manual(values = c(`TRUE` = '#FF374A', 'FALSE' = 'gray'), 
                    name = 'within 1 SD') +
  scale_color_manual(values = c('black', 'blue'), name = 'Density lines') +
  labs(x = 'Value of random number', y = 'Density') +
  theme_minimal()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述