geom_density2d的等效重量

Tyl*_*ker 12 r ggplot2

请考虑以下数据:

   contesto       x       y perc
1       M01  81.370 255.659   22
2       M02  85.814 242.688   16
3       M03  73.204 240.526   33
4       M04  66.478 227.916   46
5      M04a  67.679 218.668   15
6       M05  59.632 239.325   35
7       M06  64.316 252.777   23
8       M08  90.258 227.676   45
9       M09 100.707 217.828   58
10      M10  89.829 205.278   53
11      M11 114.998 216.747   15
12      M12 119.922 235.482   18
13      M13 129.170 239.205   36
14      M14 142.501 229.717   24
15      M15  76.206 213.144   24
16      M16  30.090 166.785   33
17      M17 130.731 219.989   56
18      M18  74.885 192.336   36
19      M19  48.823 142.645   32
20      M20  48.463 186.361   24
21      M21  74.765 205.698   16
Run Code Online (Sandbox Code Playgroud)

我想为perc加权的点x和y创建一个二维密度图.我能做到这一点(虽然我不认为正确)通过如下rep:

library(ggplot2)

dataset2 <- with(dataset, dataset[rep(1:nrow(dataset), perc),])

ggplot(dataset2, aes(x, y)) + 
    stat_density2d(aes(alpha=..level.., fill=..level..), size=2, 
        bins=10, geom="polygon") + 
    scale_fill_gradient(low = "yellow", high = "red") +
    scale_alpha(range = c(0.00, 0.5), guide = FALSE) +
    geom_density2d(colour="black", bins=10) +
    geom_point(data = dataset) +
    guides(alpha=FALSE) + xlim(c(10, 160)) + ylim(c(120, 280))
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

这似乎不是正确的方法,因为其他geoms允许加权,如:

dat <- as.data.frame(ftable(mtcars$cyl))
ggplot(dat, aes(x=Var1)) + geom_bar(aes(weight=Freq))
Run Code Online (Sandbox Code Playgroud)

但是,如果我在这里尝试使用权重,则该图与数据不匹配(desc被忽略):

ggplot(dataset, aes(x, y)) + 
    stat_density2d(aes(alpha=..level.., fill=..level.., weight=perc), 
        size=2, bins=10, geom="polygon") + 
    scale_fill_gradient(low = "yellow", high = "red") +
    scale_alpha(range = c(0.00, 0.5), guide = FALSE) +
    geom_density2d(colour="black", bins=10, aes(weight=perc)) +
    geom_point(data = dataset) +
    guides(alpha=FALSE) + xlim(c(10, 160)) + ylim(c(120, 280))
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

这是使用rep正确的方法来加权密度还是有更好的方法类似于weight论证geom_bar

这个rep方法看起来像用基本R做的内核密度所以我假设这是它应该看起来的样子:

在此输入图像描述

dataset <- structure(list(contesto = structure(1:21, .Label = c("M01", "M02", 
"M03", "M04", "M04a", "M05", "M06", "M08", "M09", "M10", "M11", 
"M12", "M13", "M14", "M15", "M16", "M17", "M18", "M19", "M20", 
"M21"), class = "factor"), x = c(81.37, 85.814, 73.204, 66.478, 
67.679, 59.632, 64.316, 90.258, 100.707, 89.829, 114.998, 119.922, 
129.17, 142.501, 76.206, 30.09, 130.731, 74.885, 48.823, 48.463, 
74.765), y = c(255.659, 242.688, 240.526, 227.916, 218.668, 239.325, 
252.777, 227.676, 217.828, 205.278, 216.747, 235.482, 239.205, 
229.717, 213.144, 166.785, 219.989, 192.336, 142.645, 186.361, 
205.698), perc = c(22, 16, 33, 46, 15, 35, 23, 45, 58, 53, 15, 
18, 36, 24, 24, 33, 56, 36, 32, 24, 16)), .Names = c("contesto", 
"x", "y", "perc"), row.names = c(NA, -21L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

Tro*_*roy 4

如果你的权重是每个坐标(或按比例)的#个观察值,我认为你做得对。该函数似乎期望所有观察结果,并且如果您在原始数据集上调用它,则无法动态更新 ggplot 对象,因为它已经对密度进行了建模,并且包含派生的绘图数据。

如果您的实际数据集很大,您可能想使用data.table而不是,它的速度大约快 70 倍。with()例如,请参阅此处了解 1m 坐标,具有 1-20 次重复(在本例中>10m 观察值)。不过,与 660 个观测值没有性能相关性(无论如何,该图可能会成为大型数据集的性能瓶颈)。

bigtable<-data.frame(x=runif(10e5),y=runif(10e5),perc=sample(1:20,10e5,T))

system.time(rep.with.by<-with(bigtable, bigtable[rep(1:nrow(bigtable), perc),]))
#user  system elapsed 
#11.67    0.18   11.92

system.time(rep.with.dt<-data.table(bigtable)[,list(x=rep(x,perc),y=rep(y,perc))])
#user  system elapsed 
#0.12    0.05    0.18

# CHECK THEY'RE THE SAME
sum(rep.with.dt$x)==sum(rep.with.by$x)
#[1] TRUE    

# OUTPUT ROWS
nrow(rep.with.dt)
#[1] 10497966
Run Code Online (Sandbox Code Playgroud)