在 ggplot2 中绘制大矩阵的直方图比基本 hist() 慢 20 倍

Question

在 ggplot2 中绘制大矩阵的直方图比基本 hist() 慢 20 倍

我有一个数字矩阵，大约有 10M 个值，只需要在直方图中显示值的分布。在基础 R 中，hist()这样做的速度非常快。但是如果我想使用ggplot，它会慢得多（我还必须先融化矩阵，但这不是限时步骤）。有什么办法可以用 ggplot 使它快速吗？

require(microbenchmark)
require(ggplot2)


mtx1 <- matrix(rnorm(6e4*150), nrow = 6e4)
df1 <- reshape2::melt(mtx1)

g_hist <- function(df){
  print(ggplot(df, aes(x=value)) + geom_histogram(bins=30))
}

print(microbenchmark(
  hist(mtx1), 
  g_hist(df1), 
times=3L 
), signif=3)


# Unit: milliseconds
#        expr  min   lq mean median   uq  max neval
#  hist(mtx1)  384  471  530    559  603  647     3
# g_hist(df1) 7710 8000 8190   8300 8440 8570     3

Run Code Online (Sandbox Code Playgroud)

Answer 1

bde*_*est 5

这是使用基本 Rhist()函数计算直方图 bin 和 bin 计数的解决方案。（计算 bin 确实似乎是中瓶颈的来源geom_histogram()）。

然后我使用计算的 bin 计数和 bin 边界geom_rect()绘制一个直方图，它看起来与geom_histogram().

所需的时间仍然大于 base hist()，但增加了 1.5 倍而不是 20 倍。

quick_hist = function(values_vec, breaks=50) {
    res = hist(values_vec, plot=FALSE, breaks=breaks)

    dat = data.frame(xmin=head(res$breaks, -1L),
                     xmax=tail(res$breaks, -1L),
                     ymin=0.0,
                     ymax=res$counts)

    ggplot(dat, aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax)) +
    geom_rect(size=0.5, colour="grey30", fill="grey80")
}


ggsave("quick_hist.png", 
       plot=quick_hist(mtx1) + theme_bw(), 
       width=8, height=4, dpi=150)


print(microbenchmark(hist(mtx1), 
                     g_hist(df1), 
                     print(quick_hist(mtx1, breaks=30)),
                     times=5L), signif=3)

# Unit: milliseconds
#                                  expr  min   lq mean median   uq  max neval
#                            hist(mtx1)  264  270  305    298  332  359     5
#                           g_hist(df1) 5740 5760 6180   5770 5920 7700     5
#  print(quick_hist(mtx1, breaks = 30))  407  418  440    433  440  503     5

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，7 月前
查看次数：	473 次
最近记录：	6 年，7 月前