多个组的密度图之间的交集

Question

多个组的密度图之间的交集

我正在使用ggplot/easyGgplot2创建两组的密度图。我想要一个指标或指示两条曲线之间有多少交点。我什至可以使用没有曲线的任何其他解决方案，只要它允许我衡量哪些组更不同（几个不同的数据组）。

在 R 中有什么简单的方法可以做到这一点吗？

例如使用此示例，它生成此图

我如何估计两者共有的面积百分比？

ggplot2.density(data=weight, xName='weight', groupName='sex',
    legendPosition="top",
    alpha=0.5, fillGroupDensity=TRUE )

Run Code Online (Sandbox Code Playgroud)

Answer 1

pic*_*ick 6

首先，制作一些要使用的数据。在这里，我们将查看内置iris数据集中两种植物物种的花瓣宽度。

## Some sample data from iris
dat <- droplevels(with(iris, iris[Species %in% c("versicolor", "virginica"), ]))

## make a similar graph
library(ggplot2)
ggplot(dat, aes(Petal.Width, fill=Species)) +
  geom_density(alpha=0.5)

Run Code Online (Sandbox Code Playgroud)

要找到相交的面积，您可以使用approxfun来近似描述重叠的函数。然后，将其积分以获得该区域。由于这些是密度曲线，它们的面积为 1 (ish)，因此积分将是百分比重叠。

## Get density curves for each species
ps <- lapply(split(dat, dat$Species), function(x) {
    dens <- density(x$Petal.Width)
    data.frame(x=dens$x, y=dens$y)
})

## Approximate the functions and find intersection
fs <- sapply(ps, function(x) approxfun(x$x, x$y, yleft=0, yright=0))
f <- function(x) fs[[1]](x) - fs[[2]](x)   # function to minimize (difference b/w curves)
meet <- uniroot(f, interval=c(1, 2))$root  # intersection of the two curves

## Find overlapping x, y values
ps1 <- is.na(cut(ps[[1]]$x, c(-Inf, meet)))
ps2 <- is.na(cut(ps[[2]]$x, c(Inf, meet)))
shared <- rbind(ps[[1]][ps1,], ps[[2]][ps2,])

## Approximate function of intersection
f <- with(shared, approxfun(x, y, yleft=0, yright=0))

## have a look
xs <- seq(0, 3, len=1000)
plot(xs, f(xs), type="l", col="blue", ylim=c(0, 2))

points(ps[[1]], col="red", type="l", lty=2, lwd=2)
points(ps[[2]], col="blue", type="l", lty=2, lwd=2)

polygon(c(xs, rev(xs)), y=c(f(xs), rep(0, length(xs))), col="orange", density=40)

Run Code Online (Sandbox Code Playgroud)

## Integrate it to get the value
integrate(f, lower=0, upper=3)$value
# [1] 0.1548127

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，7 月前
查看次数：	1545 次
最近记录：	10 年，7 月前