在ggplot2中显示每个因子级别的原始值和加权平均值

Question

在ggplot2中显示每个因子级别的原始值和加权平均值

我试图显示不同因子水平(样本)和加权平均值(权重=覆盖率)的变量(等位基因特异性表达).

我已经制作了一些样本数据:

set.seed(2)
x <- sample(c("A","B","C"), 100, replace=T)
y <- rnorm(100)
w <- ceiling(rnorm(100,200,200))
df <- data.frame(x, y, w)

library(ggplot2)
ggplot(df, aes(x=factor(x), y=y, weight=w)) +
  geom_point(aes(size=w)) +
  stat_summary(fun.y=mean, colour="red", geom="point", size=5)

Run Code Online (Sandbox Code Playgroud)

(我也试图发布情节 - 但我还没有足够的分数).

这很好 - 但它显示了未加权的意思......

library(plyr)
means <- ddply(df, "x", function(x) data.frame(wm=weighted.mean(x$y, x$w),
                                               m=mean(x$y)))
means

 x          wm           m
1 A  0.00878432  0.11027454
2 B -0.07283770 -0.13605530
3 C -0.14233389  0.08116117

Run Code Online (Sandbox Code Playgroud)

所以 - 我只是试图将"wm"值显示为红点 - 使用ggplot2.我认为必须正确使用"weight = .." - 但我现在放弃......

我真的希望有人可以提供帮助.

Answer 1

Aru*_*run 5

我想创建的summarydata.frame与mean和weighted mean第一如下:

require(plyr)
dd <- ddply(df, .(x), summarise, m=mean(y), wm=weighted.mean(y, w))

Run Code Online (Sandbox Code Playgroud)

然后,我通过加载这些数据来绘制图,以显示平均值和加权平均值.

require(reshape2) # for melt
require(ggplot2)
ggplot() + geom_point(data = df, aes(x=factor(x), y=y, size=w)) + 
          geom_point(data = melt(dd, id.var="x"), 
          aes(x=x, y=value, colour=variable), size=5) 

# if you want to remove the legend "variable"
scale_colour_discrete(breaks=NULL)

Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

您可能需要考虑使用scale_size_area()以提供更好/无偏大小的值分配.

归档时间：	12 年，8 月前
查看次数：	1344 次
最近记录：	12 年，8 月前