在 gamm4 R 封装中进行 GAM-GEE？

Question

在 gamm4 R 封装中进行 GAM-GEE？

我正在尝试分析生物体的一些视觉断面数据以生成栖息地分布模型。一旦发现生物体，就会按照给定的时间间隔收集点数据来跟踪它们。由于这些“跟随”之间的自相关性，我希望使用类似于 Pirotta 等人的 GAM-GEE 方法。2011，使用包“yags”和“splines”（http://www.int-res.com/abstracts/meps/v436/p257-272/）。他们的 R 脚本如下所示 (http://www.int-res.com/articles/suppl/m436p257_supp/m436p257_supp1-code.r)。我使用此代码的成功有限，并且存在多个模型无法收敛的问题。

\n\n

以下是我的数据结构：

\n\n

> str(dat2)\n\n\'data.frame\':   10792 obs. of  4 variables:\n\n $ dist_slag       : num  26475 26340 25886 25400 24934 ...\n $ Depth           : num  -10.1 -10.5 -16.6 -22.2 -29.7 ...\n$ dolphin_presence: int  0 0 0 0 0 0 0 0 0 0 ...\n\n\n $ block           : int  1 1 1 1 1 1 1 1 1 1 ...\n\n\n> head(dat2)\n\n  dist_slag    Depth dolphin_presence block\n1  26475.47 -10.0934                0     1\n2  26340.47 -10.4870                0     1\n3  25886.33 -16.5752                0     1\n4  25399.88 -22.2474                0     1\n\n\n\n5  24934.29 -29.6797                0     1\n6  24519.90 -26.2370                0     1\n

Run Code Online (Sandbox Code Playgroud)\n\n

这是我的块变量的摘要（指示每个块中存在自相关的组数

\n\n

> summary(dat2$block)\n   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. \n   1.00   39.00   76.00   73.52  111.00  148.00\n

Run Code Online (Sandbox Code Playgroud)\n\n

不过，我想使用“gamm4”包，因为我更熟悉Simon Wood教授的包和功能，看来gamm4可能是最合适的。值得注意的是，这些模型具有二元响应（沿横断面有机体存在或不存在），因此我认为 gamm4 比 gamm 更合适。在 gamm 帮助中，它提供了以下因子内自相关的示例：

\n\n

## more complicated autocorrelation example - AR errors\n## only within groups defined by `fac\'\ne <- rnorm(n,0,sig)\nfor (i in 2:n) e[i] <- 0.6*e[i-1]*(fac[i-1]==fac[i]) + e[i]\ny <- f + e\nb <- gamm(y~s(x,k=20),correlation=corAR1(form=~1|fac))\n

Run Code Online (Sandbox Code Playgroud)\n\n

按照此示例，以下是我用于数据集的代码

\n\n

b <- gamm4(dolphin_presence~s(dist_slag)+s(Depth),random=(form=~1|block),  family=binomial(),data=dat)\n

Run Code Online (Sandbox Code Playgroud)\n\n

然而，通过检查输出 (summary(b$gam)) 特别是 summary(b$mer))，我要么不确定如何解释结果，要么不相信正在考虑组内的自相关。

\n\n

> summary(b$gam)\n\nFamily: binomial \nLink function: logit \n\nFormula:\ndolphin_presence ~ s(dist_slag) + s(Depth)\n\nParametric coefficients:\n\n\n            Estimate Std. Error z value Pr(>|z|)   \n    (Intercept)  -13.968      5.145  -2.715  0.00663 **\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1 \n\nApproximate significance of smooth terms:\n\n\n               edf Ref.df Chi.sq  p-value    \ns(dist_slag) 4.943  4.943  70.67 6.85e-14 ***\ns(Depth)     6.869  6.869 115.59  < 2e-16 ***\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1 \n\n\n\nR-sq.(adj) =  0.317glmer.ML score =  10504  Scale est. = 1         n = 10792\n> \n\n> summary(b$mer)\nGeneralized linear mixed model fit by the Laplace approximation \n\n\n   AIC   BIC logLik deviance\n 10514 10551  -5252    10504\nRandom effects:\n Groups Name         Variance Std.Dev.\n Xr     s(dist_slag) 1611344  1269.39 \n Xr.0   s(Depth)       98622   314.04 \nNumber of obs: 10792, groups: Xr, 8; Xr.0, 8\n\n\n\nFixed effects:\n                 Estimate Std. Error z value Pr(>|z|)   \nX(Intercept)      -13.968      5.145  -2.715  0.00663 **\nXs(dist_slag)Fx1  -35.871     33.944  -1.057  0.29063   \nXs(Depth)Fx1        3.971      3.740   1.062  0.28823   \n\n\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1 \n\nCorrelation of Fixed Effects:\n            X(Int) X(_)F1\nXs(dst_s)F1  0.654       \nXs(Dpth)Fx1 -0.030  0.000\n> \n

Run Code Online (Sandbox Code Playgroud)\n\n

如何确保在“块”变量的每个唯一值中确实考虑了自相关？解释“summary(b$mer)”输出的最简单方法是什么？

\n\n

结果确实与使用相同变量和参数但没有“correlation=...”术语的普通 gam（mgcv 包）不同，表明发生了不同的情况。

\n\n

但是，当我对相关项（季节）使用不同的变量时，我得到相同的输出：

\n\n

> dat2 <- data.frame(dist_slag = dat$dist_slag, Depth = dat$Depth, dolphin_presence = dat$dolphin_presence,\n\n+ block = dat$block, season=dat$season)\n > head(dat2)\n      dist_slag    Depth dolphin_presence block season\n1  26475.47 -10.0934                0     1      F\n2  26340.47 -10.4870                0     1      F\n\n3  25886.33 -16.5752                0     1      F\n4  25399.88 -22.2474                0     1      F\n5  24934.29 -29.6797                0     1      F\n6  24519.90 -26.2370                0     1      F\n\n> summary(dat2$season)\n\n   F    S \n3224 7568 \n\n\n> b <- gamm4(dolphin_presence~s(dist_slag)+s(Depth),correlation=corAR1(1, form=~1 |   season), family=binomial(),data=dat2)\n> summary(b$gam)\n\nFamily: binomial \nLink function: logit \n\n\nFormula:\ndolphin_presence ~ s(dist_slag) + s(Depth)\n\nParametric coefficients:\n            Estimate Std. Error z value Pr(>|z|)   \n    (Intercept)  -13.968      5.145  -2.715  0.00663 **\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1 \n\n\nApproximate significance of smooth terms:\n               edf Ref.df Chi.sq  p-value    \ns(dist_slag) 4.943  4.943  70.67 6.85e-14 ***\ns(Depth)     6.869  6.869 115.59  < 2e-16 ***\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1 \n\n\nR-sq.(adj) =  0.317glmer.ML score =  10504  Scale est. = 1         n = 10792\n> summary(b$mer)\nGeneralized linear mixed model fit by the Laplace approximation \n   AIC   BIC logLik deviance\n\n 10514 10551  -5252    10504\nRandom effects:\n Groups Name         Variance Std.Dev.\n Xr     s(dist_slag) 1611344  1269.39 \n Xr.0   s(Depth)       98622   314.04 \nNumber of obs: 10792, groups: Xr, 8; Xr.0, 8\n\n\nFixed effects:\n                 Estimate Std. Error z value Pr(>|z|)   \nX(Intercept)      -13.968      5.145  -2.715  0.00663 **\nXs(dist_slag)Fx1  -35.871     33.944  -1.057  0.29063   \nXs(Depth)Fx1        3.971      3.740   1.062  0.28823   \n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1 \n\nCorrelation of Fixed Effects:\n            X(Int) X(_)F1\nXs(dst_s)F1  0.654       \nXs(Dpth)Fx1 -0.030  0.000\n> \n

Run Code Online (Sandbox Code Playgroud)\n\n

我只是想确保它正确地允许“块”变量的每个值之间的相关性。我如何制定模型来表明自相关可以存在于块的每个单个值中，但假设块之间是独立的？

\n\n

另一方面，在较大模型（变量多于 2 个）的模型完成后，我还收到以下警告消息：

\n\n

Warning message:\n In mer_finalize(ans) : false convergence (8)\n

Run Code Online (Sandbox Code Playgroud)\n

Answer 1

Ben*_*ker 5

gamm4构建于之上lme4，它不允许使用参数（与位于之下的 , 包correlation相反）。确实处理二进制数据，尽管它使用 PQL，它通常不如 Laplace/GHQ 近似准确，如. 不幸的是（!!），您没有收到警告，告诉您该参数被忽略（当我尝试使用带有参数的简单示例时，我确实收到了警告，但额外的参数可能会被忽略）吞入内部某处）。nlmemgcv::gammmgcv::gamm gamm4/lme4correlationcorrelationlme4gamm4

nlme您所需的自相关结构（“自相关可以存在于块的每个单个值中，但假设块之间独立”）正是相关结构在（因此在）中编码的方式mgcv::gamm。

我会使用mcgv::gamm，并建议如果可能的话，您可以在一些具有已知结构的模拟数据上尝试一下（或者使用上面补充材料中提供的数据集，看看是否可以用您的替代方法重现他们的定性结论）。

StackOverflow 很好，但可能有更多混合模型专业知识r-sig-mixed-models@r-project.org

归档时间：	14 年前
查看次数：	2549 次
最近记录：	14 年前