通过该网站的建议,我在 ggplot 中构建了一个 hexbin 图,它显示每个 bin 中的数据点计数,并突出显示感兴趣的特定 bin。
我现在想进一步扩展此图,以显示每个六边形中第二个分组类别的比例。这已经可以通过hextri包实现,但我无法将上一个问题的 ggplot 解决方案与 hextri 包的输出结合起来。
最终目标是得到一个看起来像 hextri 包的输出的图,并突出显示感兴趣的单元格。
下面是一些示例数据代码,可以创建带有突出显示的单元格的 ggplot 以及显示分类比例的 hextri 图。我想将这两个功能结合起来。
我尝试使用 hextri 函数的边界输入来实现所需的结果,但尚未成功。
library(hextri)
library(ggplot2)
n = 100
df = data.frame(x = rnorm(n),
y = rnorm(n),
group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))
# hextri plot
hextri_plot = hextri(
df$x,
df$y,
class = df$group,
colour = c("#00b38a", "#ea324c"),
nbins = 3,
diffuse = FALSE,
sorted = TRUE
)
# GGplot
ggplot(df, aes(x = x, y = y)) +
geom_hex() +
stat_summary_hex(aes(
z = group,
color = after_stat(as.character(value))
), fun = ~ +any(.x == 1), fill = NA) +
scale_color_manual(
values = c("0" = "transparent", "1" = "yellow"),
guide = "none"
)
Run Code Online (Sandbox Code Playgroud)
All*_*ron 17
这不是一个小问题。它需要编写一个新的Geom、一个新的Stat和一个新的Grob(见下文)。我个人并不相信它是一个很好的数据可视化选项,因为它会导致位置扭曲并且涉及显着的舍入误差。然而,它在视觉上很有吸引力并且相当直观,所以我还是继续写了一个geom_hextri。为了让它发挥作用,我们只需将其美学映射到一个分类变量,它的行为就应该与预期的一样。
让我们使用您自己的示例数据:
set.seed(1)
n = 100
df = data.frame(x = rnorm(n),
y = rnorm(n),
group = sample(0:1, n, prob = c(0.9, 0.1), replace = TRUE))
Run Code Online (Sandbox Code Playgroud)
并geom_hextri使用您选择的配色方案绘制它。我们将覆盖点,以便确保段填充的逻辑与点匹配。
ggplot(df, aes(x, y, fill = factor(group), color = factor(group))) +
geom_hextri(linewidth = 0.3, bins = 4) +
geom_point(shape = 21, size = 3, color = "black") +
coord_equal() +
theme_classic(base_size = 16) +
theme(aspect.ratio = 1) +
scale_fill_manual("Group", values = c("#00b38a", "#ea324c")) +
scale_color_manual("Group", values = c("#00b38a", "#ea324c"))
Run Code Online (Sandbox Code Playgroud)
请注意,如果我们愿意,可以轻松更改垃圾箱的大小和美观。为了在三角形周围获得实心六边形,我们只需添加一个geom_hex图层:
ggplot(df, aes(x, y, fill = factor(group))) +
geom_hextri(color = "black", linewidth = 0.1, bins = 5) +
geom_point(shape = 21, size = 3) +
geom_hex(fill = NA, color = "black", linewidth = 1, bins = 5) +
coord_equal() +
theme_classic(base_size = 16) +
theme(aspect.ratio = 1) +
scale_fill_manual("Group", values = c("gray", "red"))
Run Code Online (Sandbox Code Playgroud)
应用于另一个数据集我们得到:
ggplot(df, aes(x, y, fill = factor(group))) +
geom_hextri(color = "black", linewidth = 0.1, bins = 5) +
geom_point(shape = 21, size = 3) +
geom_hex(fill = NA, color = "black", linewidth = 1, bins = 5) +
coord_equal() +
theme_classic(base_size = 16) +
theme(aspect.ratio = 1) +
scale_fill_manual("Group", values = c("gray", "red"))
Run Code Online (Sandbox Code Playgroud)
另请注意,我们不需要使用填充美学。例如,我们可以简单地更改轮廓颜色:
ggplot(iris, aes(Sepal.Width, Sepal.Length, fill = Species)) +
geom_hextri(color = "white", linewidth = 0.1, bins = 5) +
geom_point(shape = 21, size = 3, position = position_jitter(0.03, 0.03),
color = "white") +
geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
coord_equal() +
theme_minimal(base_size = 20) +
theme(aspect.ratio = 1) +
scale_fill_brewer(palette = "Set2")
Run Code Online (Sandbox Code Playgroud)
代码为geom_hextri
现在困难的部分是实施geom_hextri。我试图将其分解为几块,但代码必然很长,而且很难详细解释。我还必须牺牲一点间距,以使其适合不需要滚动的代码框。
最终,ggplot 必须在绘图设备上将对象绘制为图形对象 ( grobs),但是没有现成的现成工具grob可以绘制这些六边形线段,因此我们需要定义一个函数,在grid::polygonGrob给定适当的 x 的情况下使用 来绘制它们、y 坐标、高度、宽度、图形参数以及我们正在处理的段。这需要接受矢量化数据才能与 ggplot 一起使用:
ggplot(iris, aes(Sepal.Width, Sepal.Length, colour = Species)) +
geom_hextri(fill = NA, linewidth = 2, bins = 5, alpha = 1) +
geom_hex(fill = NA, colour = NA, linewidth = 1, bins = 5) +
coord_equal() +
theme_minimal(base_size = 20) +
theme(aspect.ratio = 1) +
scale_colour_brewer(palette = "Set1")
Run Code Online (Sandbox Code Playgroud)
但这本身还不够。我们还需要定义一个geom继承自 的GeomHex,但有它自己的compute_group方法来适当地调用我们的hextriGrob函数。其工作的一部分是确保美学被正确地分割成多个部分,由于技术原因,这不能在一个Stat层内轻松完成。
hextriGrob <- function(x, y, seg, height, width, gp = grid::gpar()) {
gp <- lapply(seq_along(x), function(i) structure(gp[i], class = "gpar"))
xl <- x - width
xr <- x + width
y1 <- y + 2 * height
y2 <- y + height
y3 <- y - height
y4 <- y - 2 * height
pg <- grid::polygonGrob
do.call(grid::gList,
Map(function(x, y, xl, xr, y1, y2, y3, y4, seg, gp) {
if(seg == 1) return(pg(x = c(x, x, xr, x), y = c(y, y1, y2, y), gp = gp))
if(seg == 2) return(pg(x = c(x, xr, xr, x), y = c(y, y2, y3, y), gp = gp))
if(seg == 3) return(pg(x = c(x, xr, x, x), y = c(y, y3, y4, y), gp = gp))
if(seg == 4) return(pg(x = c(x, x, xl, x), y = c(y, y4, y3, y), gp = gp))
if(seg == 5) return(pg(x = c(x, xl, xl, x), y = c(y, y3, y2, y), gp = gp))
if(seg == 6) return(pg(x = c(x, xl, x, x), y = c(y, y2, y1, y), gp = gp))
}, x = x, y = y, xl = xl, xr = xr, y1 = y1,
y2 = y2, y3 = y3, y4 = y4, seg = seg, gp = gp))
}
Run Code Online (Sandbox Code Playgroud)
在我们的数据到达这个几何图形之前,需要将其装箱为六边形。不幸的是,现有的StatBinhex无法做到这一点,同时保留我们需要的各个段级美学细节,因此我们必须编写自己的分箱函数:
GeomHextri <- ggproto("GeomHextri", GeomHex,
draw_group = function (self, data, panel_params, coord, lineend = "butt",
linejoin = "mitre", linemitre = 10) {
table_six <- function(vec) {
if(!is.factor(vec)) vec <- factor(vec)
tab <- round(6 * table(vec, useNA = "always")/length(vec))
n <- diff(c(0, findInterval(cumsum(tab) / sum(tab), 1:6/6)))
rep(names(tab), times = n)
}
num_cols <- sapply(data, is.numeric)
non_num_cols <- names(data)[!num_cols]
num_cols <- names(data)[num_cols]
datasplit <- split(data, interaction(data$x, data$y, drop = TRUE))
data <- do.call("rbind", lapply(seq_along(datasplit), function(i) {
num_list <- lapply(datasplit[[i]][num_cols], function(x) rep(mean(x), 6))
non_num_list <- lapply(datasplit[[i]][non_num_cols], function(x) {
table_six(rep(x, times = datasplit[[i]]$count))})
d <- datasplit[[i]][rep(1, 6),]
d[num_cols] <- num_list
d[non_num_cols] <- non_num_list
d$tri <- 1:6
d$group <- i
d}))
data <- ggplot2:::check_linewidth(data, snake_class(self))
if (ggplot2:::empty(data)) return(zeroGrob())
coords <- coord$transform(data, panel_params)
hw <- c(min(diff(unique(sort(coords$x)))),
min(diff(unique(sort(coords$y))))/3)
hextriGrob(coords$x, coords$y, data$tri, hw[2], hw[1],
gp = grid::gpar(col = data$colour, fill = alpha(data$fill, data$alpha),
lwd = data$linewidth * .pt, lty = data$linetype,
lineend = lineend, linejoin = linejoin,
linemitre = linemitre))})
Run Code Online (Sandbox Code Playgroud)
然后必须在自定义内部使用它Stat:
hexify <- function (x, y, z, xbnds, ybnds, xbins, ybins, binwidth,
fun = mean, fun.args = list(),
drop = TRUE) {
hb <- hexbin::hexbin(x, xbnds = xbnds, xbins = xbins, y,
ybnds = ybnds, shape = ybins/xbins, IDs = TRUE)
value <- rlang::inject(tapply(z, hb@cID, fun, !!!fun.args))
out <- hexbin::hcell2xy(hb)
out <- ggplot2:::data_frame0(!!!out)
out$value <- as.vector(value)
out$width <- binwidth[1]
out$height <- binwidth[2]
if (drop) out <- stats::na.omit(out)
out
}
Run Code Online (Sandbox Code Playgroud)
最后,我们需要编写一个 geom 函数,以便我们可以在 ggplot 调用中轻松调用上述所有内容:
StatHextri <- ggproto("StatBinhex", StatBinhex,
default_aes = aes(weight = 1, alpha = after_stat(count)),
compute_panel = function (self, data, scales, ...) {
if (ggplot2:::empty(data)) return(ggplot2:::data_frame0())
data$group <- 1
self$compute_group(data = data, scales = scales, ...)},
compute_group = function (data, scales, binwidth = NULL, bins = 30,
na.rm = FALSE){
`%||%` <- rlang::`%||%`
rlang::check_installed("hexbin", reason = "for `stat_binhex()`")
binwidth <- binwidth %||% ggplot2:::hex_binwidth(bins, scales)
if (length(binwidth) == 1) binwidth <- rep(binwidth, 2)
wt <- data$weight %||% rep(1L, nrow(data))
non_pos <- !names(data) %in% c("x", "y", "PANEL", "group")
is_num <- sapply(data, is.numeric)
aes_vars <- names(data)[non_pos & !is_num]
grps <- do.call("interaction", c(as.list(data[aes_vars]), drop = TRUE))
xbnds <- ggplot2:::hex_bounds(data$x, binwidth[1])
xbins <- diff(xbnds)/binwidth[1]
ybnds <- ggplot2:::hex_bounds(data$y, binwidth[2])
ybins <- diff(ybnds)/binwidth[2]
do.call("rbind", Map(function(data, wt) {
out <- hexify(data$x, data$y, wt, xbnds, ybnds, xbins,
ybins, binwidth, sum)
for(var in aes_vars) out[[var]] <- data[[var]][1]
out$density <- as.vector(out$value/sum(out$value, na.rm = TRUE))
out$ndensity <- out$density/max(out$density, na.rm = TRUE)
out$count <- out$value
out$ncount <- out$count/max(out$count, na.rm = TRUE)
out$value <- NULL
out$group <- 1
out}, split(data, grps), split(wt, grps)))})
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
378 次 |
| 最近记录: |