如何在ggplot2中绘制logit和probit

Joh*_*hnK 11 statistics r ggplot2

这几乎肯定是一个新问题/

对于下面的数据集,我一直试图在ggplot2中绘制logit和probit曲线,但没有成功.

Ft Temp TD

    1  66 0
    6  72 0
    11 70 1
    16 75 0
    21 75 1
    2   70 1
    7   73 0
    12 78 0
    17 70 0
    22 76 0
    3   69 0
    8   70 0
    13 67 0
    18 81 0
    23 58 1
    4   68 0
    9   57 1
    14 53 1
    19 76 0
    5   67 0
    10 63 1
    15 67 0
    20 79 0
Run Code Online (Sandbox Code Playgroud)

我天真地使用的代码是

    library(ggplot2)
    TD<-mydata$TD
    Temp<-mydata$Temp
    g<-    qplot(Temp,TD)+geom_point()+stat_smooth(method="glm",family="binomial",formula=y~x,col="red")
    g1<-g+labs(x="Temperature",y="Thermal Distress")
    g1
    g2<-g1+stat_smooth(method="glm",family="binomial",link="probit",formula=y~x,add=T)
    g2
Run Code Online (Sandbox Code Playgroud)

您能否告诉我如何改进我的代码以便在同一图表上绘制这两条曲线?

谢谢

And*_*rew 18

另一种方法是生成您自己的预测值并使用ggplot绘制它们 - 然后您可以对最终绘图进行更多控制(而不是依赖于stat_smooth计算;如果您使用多个协变量并且需要在绘图时保持一定的手段或模式).

library(ggplot2)

# Generate data
mydata <- data.frame(Ft = c(1, 6, 11, 16, 21, 2, 7, 12, 17, 22, 3, 8, 
                            13, 18, 23, 4, 9, 14, 19, 5, 10, 15, 20),
                     Temp = c(66, 72, 70, 75, 75, 70, 73, 78, 70, 76, 69, 70, 
                              67, 81, 58, 68, 57, 53, 76, 67, 63, 67, 79),
                     TD = c(0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 
                            0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0))

# Run logistic regression model
model <- glm(TD ~ Temp, data=mydata, family=binomial(link="logit"))

# Create a temporary data frame of hypothetical values
temp.data <- data.frame(Temp = seq(53, 81, 0.5))

# Predict the fitted values given the model and hypothetical data
predicted.data <- as.data.frame(predict(model, newdata = temp.data, 
                                        type="link", se=TRUE))

# Combine the hypothetical data and predicted values
new.data <- cbind(temp.data, predicted.data)

# Calculate confidence intervals
std <- qnorm(0.95 / 2 + 0.5)
new.data$ymin <- model$family$linkinv(new.data$fit - std * new.data$se)
new.data$ymax <- model$family$linkinv(new.data$fit + std * new.data$se)
new.data$fit <- model$family$linkinv(new.data$fit)  # Rescale to 0-1

# Plot everything
p <- ggplot(mydata, aes(x=Temp, y=TD)) 
p + geom_point() + 
  geom_ribbon(data=new.data, aes(y=fit, ymin=ymin, ymax=ymax), alpha=0.5) + 
  geom_line(data=new.data, aes(y=fit)) + 
  labs(x="Temperature", y="Thermal Distress") 
Run Code Online (Sandbox Code Playgroud)

更好的单线

奖金,只是为了好玩:如果你使用自己的预测功能,你可以对协变量发疯,比如展示模型如何适应不同的水平Ft:

# Alternative, if you want to go crazy
# Run logistic regression model with two covariates
model <- glm(TD ~ Temp + Ft, data=mydata, family=binomial(link="logit"))

# Create a temporary data frame of hypothetical values
temp.data <- data.frame(Temp = rep(seq(53, 81, 0.5), 2),
                        Ft = c(rep(3, 57), rep(18, 57)))

# Predict the fitted values given the model and hypothetical data
predicted.data <- as.data.frame(predict(model, newdata = temp.data, 
                                        type="link", se=TRUE))

# Combine the hypothetical data and predicted values
new.data <- cbind(temp.data, predicted.data)

# Calculate confidence intervals
std <- qnorm(0.95 / 2 + 0.5)
new.data$ymin <- model$family$linkinv(new.data$fit - std * new.data$se)
new.data$ymax <- model$family$linkinv(new.data$fit + std * new.data$se)
new.data$fit <- model$family$linkinv(new.data$fit)  # Rescale to 0-1

# Plot everything
p <- ggplot(mydata, aes(x=Temp, y=TD)) 
p + geom_point() + 
  geom_ribbon(data=new.data, aes(y=fit, ymin=ymin, ymax=ymax, 
                                       fill=as.factor(Ft)), alpha=0.5) + 
  geom_line(data=new.data, aes(y=fit, colour=as.factor(Ft))) + 
  labs(x="Temperature", y="Thermal Distress") 
Run Code Online (Sandbox Code Playgroud)

更好的多行

  • 这是非常优雅的,但是通过构建你自己的(基于正常的)置信区间而不是使用`glm`,你得到的置信区间超过(0,1)范围,这可能是*不是OP想要的...... . (6认同)