可视化R中两个连续变量和一个分类变量之间的三向交互

Question

可视化R中两个连续变量和一个分类变量之间的三向交互

Sar*_*rah 5 interaction r predict ggplot2

我在R中有一个模型,它包括两个连续独立变量IVContinuousA,IVContinuousB,IVCategorical和一个分类变量(两个级别:控制和治疗)之间的显着三向相互作用.因变量是连续的(DV).

model <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)

Run Code Online (Sandbox Code Playgroud)

你可以在这里找到数据

我试图在R中找到一种可视化的方法来简化我对它的解释(也许在ggplot2？).

有点受到这篇博文的启发,我认为我可以将其IVContinuousB分为高值和低值(所以它本身就是一个两级因素:

IVContinuousBHigh <- mean(IVContinuousB) + sd (IVContinuousB) 
IVContinuousBLow <- mean(IVContinuousB) - sd (IVContinuousB)

Run Code Online (Sandbox Code Playgroud)

然后我计划绘制DV和IV ContinuousA之间的关系以及表示这种关系的斜率的拟合线,用于IVCategorical和我的新二分IVContinuousB的不同组合:

IVCategoricalControl和IVContinuousBHigh
IVCategoricalControl和IVContinuousBLow
IVCategoricalTreatment和IVContinuousBHigh
IVCategoricalTreatment和IVContinuousBLow

我的第一个问题是,这听起来像是一个可行的解决方案,可以产生这种三向互动的可解释图吗？我想尽可能避免3D情节,因为我发现它们不直观......或者还有另一种方法可以解决这个问题吗？也许上面不同组合的方面图？

如果它是一个好的解决方案,我的第二个问题是如何生成数据来预测拟合线来表示上面的不同组合？

第三个问题 - 有没有人对如何在ggplot2中编写代码有任何建议？

我在Cross Validated上发布了一个非常相似的问题,但因为它与代码有关,我想我会在这里尝试(如果这个与社区更相关,我将删除CV帖子:))

非常感谢,提前,

莎拉

请注意,NADV列中有s(左侧为空白)且设计不平衡 - 变量IVCategorical的Control vs Treatment组中的数据点数量略有不同.

仅供参考我有关于在IVContinuousA和IVCategorical之间进行双向互动的代码:

<-ggplot(data = data,aes(x = AOTAverage,y = SciconC,group = MisinfoCondition,shape = MisinfoCondition,col = MisinfoCondition,))+ geom_point(size = 2)+ geom_smooth(method ='lm',formula = Y〜x)的

但我想要的是以IVContinuousB为条件来描绘这种关系....

Answer 1

eip*_*i10 9

以下是两个可视化两维模型输出的选项.我在这里假设这里的目标是比较Treatment对Control

library(tidyverse)
  theme_set(theme_classic() +
          theme(panel.background=element_rect(colour="grey40", fill=NA))

dat = read_excel("Some Data.xlsx")  # I downloaded your data file

mod <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical, data=dat)

# Function to create prediction grid data frame
make_pred_dat = function(data=dat, nA=20, nB=5) {
  nCat = length(unique(data$IVCategorical))
  d = with(data, 
           data.frame(IVContinuousA=rep(seq(min(IVContinuousA), max(IVContinuousA), length=nA), nB*2),
                      IVContinuousB=rep(rep(seq(min(IVContinuousB), max(IVContinuousB), length=nB), each=nA), nCat),
                      IVCategorical=rep(unique(IVCategorical), each=nA*nB)))

  d$DV = predict(mod, newdata=d)

  return(d)
}

Run Code Online (Sandbox Code Playgroud)

`IVContinuousA`与`DV`水平相比`IVContinuousB`

的角色IVContinuousA和IVContinuousB过程都可以在这里切换.

ggplot(make_pred_dat(), aes(x=IVContinuousA, y=DV, colour=IVCategorical)) + 
  geom_line() +
  facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="")

Run Code Online (Sandbox Code Playgroud)

您可以在没有分面的情况下制作类似的情节,但随着IVContinuousB级别数量的增加,很难解释:

ggplot(make_pred_dat(nB=3), 
       aes(x=IVContinuousA, y=DV, colour=IVCategorical, linetype=factor(round(IVContinuousB,2)))) + 
  geom_line() +
  #facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="", linetype="IVContinuousB") +
  scale_linetype_manual(values=c("1434","11","62")) +
  guides(linetype=guide_legend(reverse=TRUE))

Run Code Online (Sandbox Code Playgroud)

模型预测差异的热图,DV处理 - 网格上的DV控制`IVContinuousA`和`IVContinuousB`值

下面,我们来看看每对IVContinuousA和治疗之间的区别IVContinuousB.

ggplot(make_pred_dat(nA=100, nB=100) %>% 
         group_by(IVContinuousA, IVContinuousB) %>% 
         arrange(IVCategorical) %>% 
         summarise(DV = diff(DV)), 
       aes(x=IVContinuousA, y=IVContinuousB)) + 
  geom_tile(aes(fill=DV)) +
  scale_fill_gradient2(low="red", mid="white", high="blue") +
  labs(fill=expression(Delta*DV~(Treatment - Control)))

Run Code Online (Sandbox Code Playgroud)

Answer 2

Dan*_*ann 5

如果你真的想避免三维绘图,你确实可以将一个连续变量转换成一个用于可视化目的的分类变量.

出于答案的目的,我使用了Duncan包中的数据集car,因为它与您描述的数据集的形式相同.

library(car)
# the data
data("Duncan")

# the fitted model; education and income are continuous, type is categorical
lm0 <- lm(prestige ~ education * income * type, data = Duncan)

# turning education into high and low values (you can extend this to more 
# levels)
edu_high <- mean(Duncan$education)  + sd(Duncan$education)
edu_low <- mean(Duncan$education)  - sd(Duncan$education)

# the values below should be used for predictions, each combination of the 
# categories must be represented:
prediction_mat <- data.frame(income = Duncan$income, 
                         education = rep(c(edu_high, edu_low),each = 
                         nrow(Duncan)),
                         type = rep(levels(Duncan$type), each = 
                         nrow(Duncan)*2))


predicted <- predict(lm0, newdata = prediction_mat)


# rearranging the fitted values and the values used for predictions
df <- data.frame(predicted,
             income = Duncan$income,
             edu_group =rep(c("edu_high", "edu_low"),each = nrow(Duncan)),
             type = rep(levels(Duncan$type), each = nrow(Duncan)*2))


# plotting the fitted regression lines
ggplot(df, aes(x = income, y = predicted, group = type, col = type)) + 
geom_line() + 
facet_grid(. ~ edu_group)

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，12 月前
查看次数：	3252 次
最近记录：	7 年，12 月前

可视化R中两个连续变量和一个分类变量之间的三向交互

IVContinuousA与DV水平相比IVContinuousB

模型预测差异的热图,DV处理 - 网格上的DV控制IVContinuousA和IVContinuousB值

`IVContinuousA`与`DV`水平相比`IVContinuousB`

模型预测差异的热图,DV处理 - 网格上的DV控制`IVContinuousA`和`IVContinuousB`值