Ggplot2绘制小平面上的子集的平均值而不是全局平均值

S12*_*000 6 r facet ggplot2

我想用ggplot得到子集的facet subet mean(x + y axis).但是,我得到数据的平均值而不是子集1.我不知道如何解决这个问题.

hsb2<-read.table("http://www.ats.ucla.edu/stat/data/hsb2.csv", sep=",", header=T)
head(hsb2)
hsb2$gender = as.factor(hsb2$female)

ggplot() +
  geom_point(aes(y = read,x = write,colour = gender),data=hsb2,size = 2.2,alpha = 0.9) +
  scale_colour_brewer(guide = guide_legend(),palette = 'Set1') +
  stat_smooth(aes(x = write,y = read),data=hsb2,colour = '#000000',size = 0.8,method = lm,formula = 'y ~ x') +
  geom_vline(aes(xintercept = mean(write)),data=hsb2,linetype = 3) +
  geom_hline(aes(yintercept = mean(read)),data=hsb2,linetype = 3) +
  facet_wrap(facets = ~gender)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

Ram*_*han 7

一种方法是明确计算每个性别的均值(x和y),并将它们存储为原始数据框中的新列.当分面按性别分割时,线条会被绘制到您想要的位置.

使用tapply

#compute the read and write means for each gender 
read_means <- tapply(hsb2$read, hsb2$gender, mean)
write_means <- tapply(hsb2$write, hsb2$gender, mean)

#store it in the data frame
hsb2$read_mean <- ifelse(hsb2$gender==0, read_means[1], read_means[2])
hsb2$write_mean <- ifelse(hsb2$gender==0, write_means[1], write_means[2])
Run Code Online (Sandbox Code Playgroud)

上面这些行的替代方法是使用ddply.

使用Plyr包中的ddply

可以使用单行创建新列.

library(plyr)
ddply(hsb2, "gender", transform, 
      read_mean  = mean(read),
      write_mean = mean(write))
Run Code Online (Sandbox Code Playgroud)

现在,将两个新列方法传递给ggplot中的vline和hline调用.

ggplot() +
  geom_point(aes(y = read,x = write,colour = gender),data=hsb2,size = 2.2,alpha = 0.9) +
  scale_colour_brewer(guide = guide_legend(),palette = 'Set1') +
  stat_smooth(aes(x = write,y = read),data=hsb2,colour = '#000000',
              size = 0.8,method = lm,formula = 'y ~ x') +
  geom_vline(aes(xintercept = write_mean),data=hsb2,linetype = 3) +
  geom_hline(aes(yintercept = read_mean),data=hsb2,linetype = 3) +
  facet_wrap(facets = ~gender)
Run Code Online (Sandbox Code Playgroud)

生产: 在此输入图像描述