仅为重要的拟合绘制geom_smooth

Ale*_*nov 5 statistics r ggplot2

我怎样才能制作ggplot情节geom_smooth(method ="lm"),但前提是它符合某些标准?举例来说,如果我只想画线,如果斜率是统计学显著(即plm拟合小于0.01).

编辑:更新为涉及facet的更复杂的示例.我没有从头开始生成数据,而是修改了diamonds数据集.

library(ggplot2)
library(data.table)

data(diamonds)

set.seed(777)
d <- data.table(diamonds)
d[color %in% c("D","E"), c("x","y") := list(x + runif(1000, -5, 5),
                                            y + runif(1000, -5, 5))] 
plt <- ggplot(d) + aes(x=x, y=y, color=color) + 
    geom_point() + facet_grid(clarity ~ cut, scales="free")
plt + geom_smooth(method="lm")
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

我想要的是绘制除了那些没有统计上显着斜率(即D和E)的线以外的所有线的方法.

eip*_*i10 6

您可以按组计算p值,然后按geom_smooth(根据评论者)子集计算:

# Determine p-values of regression
p.vals = sapply(unique(d$z), function(i) {
  coef(summary(lm(y ~ x, data=d[z==i, ])))[2,4]
})

plt <- ggplot(d) + aes(x=x, y=y, color=z) + geom_point() 

# Select only values of z for which regression p-value is < 0.05   
plt + geom_smooth(data=d[d$z %in% names(p.vals)[p.vals < 0.05],], 
                         aes(x, y, colour=z), method='lm')
Run Code Online (Sandbox Code Playgroud)

更新:根据您的评论,试试这个,例如:

p1 = ggplot(mtcars, aes(wt, mpg)) +
  geom_point() + facet_grid(am ~ carb)

dat = data.frame(x=1:5, y=26:30, carb=1:5)

p1 + geom_point(data=dat, aes(x,y), colour="red", size=5)
Run Code Online (Sandbox Code Playgroud)

请注意,由于dat没有am列,ggplot只需dat为每个值绘制相同的值am.当然,您可以为amfacet 添加值并控制绘制的facet.

更新2:我认为这将照顾分面案例.但请注意,大多数回归的p值小于0.05,可能是因为当您拥有大量数据时,即使微小系数也具有统计意义.

## Create a list holing the p-values for regressions on each 
## combination of color, cut, and clarity
pvals = lapply(levels(d$color), function(i) {
  lapply(levels(d$cut), function(j) {
    lapply(levels(d$clarity), function(k) {
      if(nrow(d[color==i & cut==j & clarity==k, ]) > 1) {
        data.frame(color=i, cut=j, clarity=k, 
                   p.val=coef(summary(lm(y ~ x, data = d[color==i & cut==j & clarity==k, ])))[2,4])
      }
    })
  })
})

# Flatten pvals to a single list level
pvals = unlist(unlist(pvals, recursive=FALSE), recursive=FALSE)

# Turn pvals into a data frame
pvals = do.call(rbind, pvals)

# Keep only rows with p.val < 0.05
pvals = pvals[pvals$p.val < 0.05, ]

plt <- ggplot(d) + aes(x=x, y=y, color=color) + 
  geom_point() + facet_grid(clarity ~ cut, scales="free")

# Create a subset of data frame d containing only combinations of 
# color, cut, and clarity for which we want to plot regression lines
# (you could subset right in the call to geom_smooth, but I thought this would be more clear)
d.subset = d[color %in% pvals$color & 
               cut %in% pvals$cut & 
               clarity %in% pvals$clarity, ]

# Plot regression lines only for groups in d.subset
plt + geom_smooth(data=d.subset, method="lm")
Run Code Online (Sandbox Code Playgroud)

  • 然后你可以创建一个数据框,其中包含"z"和"p.vals"列以及另一个带有facet变量的列,并将其提供给带有相应子集的"geom_smooth".然后`ggplot`将知道在每个方面绘制哪些回归线.如果你在你的问题中发布一个可重复的例子,我会对它进行修改. (2认同)