混合效应模型的 ggpredict() 和 ggemmeans() 计算的预测不同：为什么？

Question

混合效应模型的 ggpredict() 和 ggemmeans() 计算的预测不同：为什么？

Mar*_*ani 8 r predict emmeans glmmtmb ggeffects

我使用1.3.0包中的函数ggpredict()和函数来计算混合效应模型的平均估计值和置信区间（以下简称：CI）。这些功能依赖于ggemmeans()ggeffectspredict()emmeans()使其输出对 ggplot 友好。这两个函数预测/估计的值的平均值和 CI 均不同。为什么？

\n

以下可重现的示例基于数据集 RIKZ（Janssen e Mulder 2005；Zuur 等人 2007），该数据集着眼于与平均潮汐水位（NAP，以米为单位）相比，物种丰富度（物种数量）如何随采样站高度变化）和暴露水平（具有三个级别的因素：低、中、高）：

\n

rm(list=ls())\nif (!require(pacman)) install.packages(\'pacman\'); library(pacman)\np_load(emmeans)\np_load(ggplot2)\np_load(ggpubr)\np_load(ggeffects)\np_load(lme4, lmerTest, glmmTMB)\np_load(RCurl)\n# get data:\nRIKZ <- read.csv(text = RCurl::getURL(\n"https://raw.githubusercontent.com/marcoplebani85/datasets/master/RIKZ.csv"))\nstr(RIKZ)\n# "Exposure" is a factor:\nRIKZ$Exposure <- as.factor(RIKZ$Exposure)\n

Run Code Online (Sandbox Code Playgroud)\n

在这里，我使用泊松分布残差将广义混合效应模型拟合到数据中glmmTMB()：

\n

mem1 <- glmmTMB(Richness ~ NAP+Exposure + (1 | Beach),\n                family="poisson",\n                data = RIKZ, REML=T)\n

Run Code Online (Sandbox Code Playgroud)\n

模型预测和 CI 根据ggeffects::ggpredict()，不考虑随机效应的不确定性（参见本页了解为何考虑或不考虑）：

\n

richness.predicted <- ggpredict(mem1, \nterms=c("NAP", "Exposure"), type="fixed")\n

Run Code Online (Sandbox Code Playgroud)\n

同一模型的预测和 CI根据ggeffects::ggemmeans()，没有考虑随机效应的不确定性：

\n

richness.emmeans <- ggemmeans(mem1, \nterms=c("NAP", "Exposure"), type="fixed")\n

Run Code Online (Sandbox Code Playgroud)\n

代表两组模型预测和 CI 以及观测数据的图表：

\n

p1 <- plot(richness.predicted, add.data=T) + \n    labs(title="Predictions by ggpredict()") + ylim(0,45) +\n    theme(text = element_text(size = 15))\np2 <- plot(richness.emmeans, add.data=T) + \n    labs(title="Predictions by ggemmeans()") + ylim(0,45) +\n    theme(text = element_text(size = 15))\nggarrange(p1, p2, ncol=2, labels=c("(a)", "(b)"), \n    common.legend=T, legend="bottom")\n

Run Code Online (Sandbox Code Playgroud)\n

\n

为什么两组均值估计值和置信区间不同？

\n

更新：

\n

这一页解释了这两个函数之间的差异：

\n

\n
[...]返回的效果ggpredict()是条件效果（即这些\n以某些（参考）因素水平为条件），而\nggemmeans()返回边际平均值，因为效果是\n\xe2\x80\x9cm边缘化\xe2 \x80\x9d （或 \xe2\x80\x9caverged\xe2\x80\x9d）超过因子水平。然而，这些差异仅适用于非焦点术语，即术语参数中未指定的剩余变量。
\n

\n

它继续说：

\n

\n
当所有分类预测变量均在项中指定且进一步\n（非焦点）项仅是数字时，结果也是相同的（因为\n两者均ggpredict()默认ggemmeans()使用平均值来保持\n非焦点数字变量不变）。
\n

\n

我怀疑混合效应模型的两个函数获得的预测存在差异的原因在于它们如何处理随机效应，但我不知道如何处理。

\n

参考：

\n

Janssen, G. 和 Mulder, S. (2005)，荷兰海岸沿沙滩和冲浪区的大型动物区划。海洋学 47(2)：265-282。
Zuur, AF、Ieno, EN 和 Smith, GM (2007)，分析生态数据。施普林格科学与商业媒体
https://strengejacke.github.io/ggeffects/articles/introduction_randomeffects.html
https://strengejacke.github.io/ggeffects/articles/technical_differencepredictemmeans.html

\n

Answer 1

dip*_*kov 2

这确实是一个评论，不是一个完整的答案，但也许它可以指出正确的方向来理解ggpredict和之间的这种微妙差异，这实际上是和之间ggemmeans的差异。predict.glmmTMBemmeans

glmmTMB使用两种估计方法：ML（最大似然）和 REML（限制最大似然）。通过适当设置参数来选择一种或另一种方法REML。

如果估计方法是ML，则ggpredict和ggemeans给出相同的结果。当估计方法为 REML 并且模型具有随机效应时，会出现差异。

fit_glmmTMB <- glmmTMB(
  Richness ~ NAP + Exposure + (1 | Beach),
  family = poisson,
  data = RIKZ,
  REML = FALSE # Use ML to fit the model.
)

ggpredict(
  fit_glmmTMB, terms = fixed_terms, type = "fixed"
)
#> # Predicted counts of Richness
#> 
#> # Exposure = medium
#> 
#>   NAP | Predicted |         95% CI
#> ----------------------------------
#> -1.40 |     13.97 | [10.64, 18.35]
#> -0.80 |     10.30 | [ 8.22, 12.91]
#> -0.20 |      7.60 | [ 6.19,  9.32]
#>  0.40 |      5.60 | [ 4.51,  6.95]
#>  1.00 |      4.13 | [ 3.20,  5.34]
#>  2.20 |      2.25 | [ 1.53,  3.29]
#>
#> [The rest of the output is cut.]

Run Code Online (Sandbox Code Playgroud)

ggemmeans(
  fit_glmmTMB, terms = fixed_terms, type = "fixed"
)
#> # Predicted counts of Richness
#> 
#> # Exposure = medium
#> 
#>   NAP | Predicted |         95% CI
#> ----------------------------------
#> -1.40 |     13.97 | [10.64, 18.35]
#> -0.80 |     10.30 | [ 8.22, 12.91]
#> -0.20 |      7.60 | [ 6.19,  9.32]
#>  0.40 |      5.60 | [ 4.51,  6.95]
#>  1.00 |      4.13 | [ 3.20,  5.34]
#>  2.20 |      2.25 | [ 1.53,  3.29]
#> 
#> [The rest of the output is cut.]

Run Code Online (Sandbox Code Playgroud)

为了进行比较，我还安装了 Poisson GLMM lme4::glmer。此函数没有选项method并使用 ML。再说一遍，没有区别。

fit_glmer <- glmer(
  Richness ~ NAP + Exposure + (1 | Beach),
  family = poisson,
  data = RIKZ
)

ggpredict(
  fit_glmer, terms = fixed_terms, type = "fixed"
)
#> # Predicted counts of Richness
#> 
#> # Exposure = medium
#> 
#>   NAP | Predicted |         95% CI
#> ----------------------------------
#> -1.40 |     13.97 | [10.64, 18.34]
#> -0.80 |     10.30 | [ 8.22, 12.91]
#> -0.20 |      7.60 | [ 6.19,  9.32]
#>  0.40 |      5.60 | [ 4.51,  6.95]
#>  1.00 |      4.13 | [ 3.20,  5.34]
#>  2.20 |      2.25 | [ 1.53,  3.29]
#> 
#> [The rest of the output is cut.]

Run Code Online (Sandbox Code Playgroud)

ggemmeans(
  fit_glmer, terms = fixed_terms, type = "fixed"
)
#> # Predicted counts of Richness
#> 
#> # Exposure = medium
#> 
#>   NAP | Predicted |         95% CI
#> ----------------------------------
#> -1.40 |     13.97 | [10.64, 18.34]
#> -0.80 |     10.30 | [ 8.22, 12.91]
#> -0.20 |      7.60 | [ 6.19,  9.32]
#>  0.40 |      5.60 | [ 4.51,  6.95]
#>  1.00 |      4.13 | [ 3.20,  5.34]
#>  2.20 |      2.25 | [ 1.53,  3.29]
#>
#> [The rest of the output is cut.]

Run Code Online (Sandbox Code Playgroud)

归档时间：	2 年，2 月前
查看次数：	721 次
最近记录：	2 年，1 月前