ggplot dotplot:geom_dotplot的正确用法是什么?

Pat*_*ckT 7 r ggplot2

我的目的是用(作者:Hadley Wickham)重现这个数字[参考]ggplot2.

在此输入图像描述

这是我的努力基础geom_point和一些丑陋的数据准备(参见下面的代码):

在此输入图像描述

我怎么能这样做geom_dotplot()

在我的尝试中,我遇到了几个问题:(1)将geom_dotplot生成的默认密度映射到计数,(2)将轴切断,(3)没有意外的漏洞.我geom_point()反而放弃了.

我期望(并且仍然希望)它会如此简单

ggplot(data, aes(x,y)) + geom_dotplot(stat = "identity")
Run Code Online (Sandbox Code Playgroud)

但不是.所以这就是我尝试过的和输出:

# Data
df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))

# dotplot based on geom_dotplot
geom_dots <- function(x, count, round = 10, breaks = NULL, ...) {
    require(ggplot2)
    n = sum(count) # total number of dots to be drawn
    b = round*round(n/round) # prettify breaks
    x = rep(x, count) # make x coordinates for dots
    if (is.null(breaks))  breaks = seq(0, 1, b/4/n)
    ggplot(data.frame(x = x), aes(x = x)) +
        geom_dotplot(method = "histodot", ...) +
        scale_y_continuous(breaks = breaks, 
                        #limits = c(0, max(count)+1), # doesn't work
                        labels = breaks * n) 
} 

geom_dots(x = df$x, count = df$y) 

# dotplot based on geom_point
ggplot_dot <- function(x, count, ...) {
    require(ggplot2)
    message("The count variable must be an integer")
    count = as.integer(count) # make sure these are counts
    n = sum(count) # total number of dots to be drawn
    x = rep(x, count) # make x coordinates for dots
    count = count[count > 0]  # drop zero cases 
    y = integer(0)  # initialize y coordinates for dots
    for (i in seq_along(count)) 
        y <- c(y, 1:(count[i]))  # compute y coordinates
    ggplot(data.frame(x = x, y = y), aes(x = x, y = y)) +
        geom_point(...)  # draw one dot per positive count
}

ggplot_dot(x = df$x, count = df$y, 
    size = 11, shape = 21, fill = "orange", color = "black") + theme_gray(base_size = 18)
# ggsave("dotplot.png") 
ggsave("dotplot.png", width = 12, height = 5.9)
Run Code Online (Sandbox Code Playgroud)

简短的随机评论:使用该geom_point()解决方案,保存绘图涉及调整尺寸恰好以确保点接触(点大小和绘图高度/宽度).通过该geom_dotplot()解决方案,我将标签四舍五入以使其更漂亮.不幸的是,我无法在大约100处切断轴:使用limits()coord_cartesian()导致整个绘图的重新缩放而不是切割.另请注意,使用geom_dotplot()我根据计数创建了一个数据向量,因为我无法直接使用count变量(我希望stat="identity"这样做,但我无法使其工作).

在此输入图像描述

And*_*rew 6

巧合的是,过去一天我也一直在geom_dotplot()努力争取并试图让它显示出计数。我还没有想出一种方法,使y轴表示实际数字,但我已经找到了一种方法,以截断y轴。正如您所提到的,coord_cartesian()并且limits不起作用,但是coord_fixed()确实如此,因为它强制执行 x:y 单位的比率:

library(tidyverse)
df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))
df <- tidyr::uncount(df, y) 

ggplot(df, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  # Make this as high as the tallest column
  coord_fixed(ratio = 15)
Run Code Online (Sandbox Code Playgroud)

在这里使用 15 作为比率是有效的,因为 x 轴也是相同的单位(即单个整数)。如果 x 轴是百分比或对数美元或日期或其他任何内容,则您必须修改比率,直到 y 轴被截断为止。


使用组合图的方法进行编辑

正如我在下面的评论中提到的那样,使用拼凑而成的组合图效果coord_fixed()不佳。但是,如果您手动将组合图的高度(或宽度)设置为与比率相同的值coord_fixed() 确保每个图具有相同的 x 轴,您可以获得伪面图

# Make a subset of df
df2 <- df %>% slice(1:25)

plot1 <- ggplot(df, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  # Make this as high as the tallest column
  # Make xlim the same on both plots
  coord_fixed(ratio = 15, xlim = c(75, 110))

plot2 <- ggplot(df2, aes(x)) +
  geom_dotplot(method = 'histodot', binwidth = 1) +
  scale_y_continuous(NULL, breaks = NULL) + 
  coord_fixed(ratio = 7, xlim = c(75, 110))

# Combine both plots in a single column, with each sized incorrectly
library(patchwork)
plot1 + plot2 +
  plot_layout(ncol = 1)
Run Code Online (Sandbox Code Playgroud)

# Combine both plots in a single column, with each sized appropriately
library(patchwork)
plot1 + plot2 +
  plot_layout(ncol = 1, heights = c(15, 7) / (15 + 7))
Run Code Online (Sandbox Code Playgroud)


Tje*_*ebo 5

你可以geom_dotplot用另一个几何体来模仿 - 我选择ggforce::geom_ellipse对你的点进行全尺寸控制。它显示 y 轴上的计数。我添加了一些行以使其更具程序性 - 并尝试重现 OP 所需的图形。该线程与此问题相关,其目的是创建带点的动画直方图

这是最终结果:(代码见下文)

如何到达那里:首先进行一些必要的数据修改

library(tidyverse)
library(ggforce)

df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))

bin_width <- 1
pt_width <- bin_width / 3 # so that they don't touch horizontally
pt_height <- bin_width / 2 # 2 so that they will touch vertically

count_data <- 
  data.frame(x = rep(df$x, df$y)) %>%
  mutate(x = plyr::round_any(x, bin_width)) %>%
  group_by(x) %>%
  mutate(y = seq_along(x))

ggplot(count_data) +
  geom_ellipse(aes(
    x0 = x,
    y0 = y,
    a = pt_width / bin_width,
    b = pt_height / bin_width,
    angle = 0
  )) +
  coord_equal((1 / pt_height) * pt_width)# to make the dot
Run Code Online (Sandbox Code Playgroud)

设置bin宽度灵活!

bin_width <- 2 
# etc (same code as above)
Run Code Online (Sandbox Code Playgroud)

现在,更详细地再现 Lind-Marchal-Wathen 图形实际上非常有趣。如果没有一些技巧,很多事情都是不可能实现的。最值得注意的是“十字”轴刻度,当然还有背景渐变(巴蒂斯特帮助)。

library(tidyverse)
library(grid)
library(ggforce)

p <- 
  ggplot(count_data) +
    annotate(x= seq(80,104,4), y = -Inf, geom = 'text', label = '|') +
  geom_ellipse(aes(
    x0 = x,
    y0 = y,
    a = pt_width / bin_width,
    b = pt_height / bin_width,
    angle = 0
  ),
  fill = "#E67D62",
  size = 0
  ) +
    scale_x_continuous(breaks = seq(80,104,4)) +
    scale_y_continuous(expand = c(0,0.1)) +
  theme_void() +
  theme(axis.line.x = element_line(color = "black"),
        axis.text.x = element_text(color = "black", 
                                   margin = margin(8,0,0,0, unit = 'pt'))) +
  coord_equal((1 / pt_height) * pt_width, clip = 'off')

oranges <- c("#FEEAA9", "#FFFBE1")
g <- rasterGrob(oranges, width = unit(1, "npc"), height = unit(0.7, "npc"), interpolate = TRUE)

grid.newpage()
grid.draw(g)
print(p, newpage = FALSE)
Run Code Online (Sandbox Code Playgroud)

由reprex 包(v0.3.0)于 2020-05-01 创建