我想创建一个ggplot2直方图,其中图的限制等于数据集中的最小值和最大值,而不排除实际直方图中的那些值.
我在使用基本图形时得到了我正在寻找的行为.具体来说,下面的第二个直方图显示了与第一个直方图相同的所有值(即,第二个直方图中没有排除任何二进制数),即使我xlim在第二个图中包含了一个参数:
min_wt <- min(mtcars$wt)
max_wt <- max(mtcars$wt)
xlim <- c(min_wt, max_wt)
hist(mtcars$wt, breaks = 30, main = "No limits added")
hist(mtcars$wt, breaks = 30, xlim = xlim, main = "Limits added")
Run Code Online (Sandbox Code Playgroud)
ggplot2虽然没有给我这种行为:
library(ggplot2)
# Using green colour to make dropped bins easy to see:
p <- ggplot(mtcars, aes(x = wt)) + geom_histogram(colour = "green", bins = 30)
p + ggtitle("No limits added")
p + xlim(xlim) + ggtitle("Limits added")
Run Code Online (Sandbox Code Playgroud)
看看在第二个图中我是如何失去低于2和2的点之一的高于5的点?我想知道如何解决这个问题.一些misc笔记:
首先,指定boundary允许我在直方图中包含最小值(即低于2的值),但是我仍然没有解决大于5的2个值的问题:
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too")
Run Code Online (Sandbox Code Playgroud)
其次,问题的存在取决于所选择的价值bins.例如,当我增加到bins50时,我没有得到任何删除的值:
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 50, colour = "green", boundary = min_wt) +
xlim(xlim) +
ggtitle("Limits added with boundary too, but with bins = 50")
Run Code Online (Sandbox Code Playgroud)
最后,我相信这个问题与SO上提出的问题有关:geom_histogram:错误的垃圾箱?并在此讨论:https://github.com/tidyverse/ggplot2/issues/1651.换句话说,我认为这个问题与"舍入错误"有关.我在这个问题的第二篇文章(图中显示的图表)中更深入地描述了这个错误:https://github.com/daattali/ggExtra/issues/81.
这是我的会话信息:
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] labeling_0.3 colorspace_1.3-2 scales_0.5.0.9000
[4] compiler_3.4.2 lazyeval_0.2.1 plyr_1.8.4
[7] tools_3.4.2 pillar_1.2.1 gtable_0.2.0
[10] tibble_1.4.2 yaml_2.1.16 Rcpp_0.12.15
[13] grid_3.4.2 rlang_0.2.0.9000 munsell_0.4.3
Run Code Online (Sandbox Code Playgroud)
@eipi10 在评论中提到的另一个选项是oob更改scale_x_continuous.
处理超出范围限制(越界)的函数。默认值用 NA 替换超出范围的值。
默认使用scales::censor(),您可以将其更改为oob = scales::squish,它将值压缩到一个范围内。
比较以下两个图。
p + scale_x_continuous(limits = xlim) + ggtitle("default: scales::censor")
Run Code Online (Sandbox Code Playgroud)
警告:删除了 1 行包含缺失值的行 (geom_bar)。
p + scale_x_continuous(limits = xlim, oob = scales::squish) + ggtitle("using scales::squish")
Run Code Online (Sandbox Code Playgroud)
你的第三个ggplot,你指定了一个边界,但仍然有 2 个大于 5 的值被丢弃,看起来像这样。
ggplot(mtcars, aes(x = wt)) +
geom_histogram(bins = 30, colour = "green", boundary = min_wt) +
scale_x_continuous(limits = xlim, oob = scales::squish) +
ggtitle("Limits added with boundary too") +
labs(subtitle = "scales::squish")
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助。