为什么在“ gam（y〜mgcv :: s ...）”中使用“ mgcv :: s”会导致错误？

Question

为什么在“ gam（y〜mgcv :: s ...）”中使用“ mgcv :: s”会导致错误？

我想保持清楚，并::在各行中使用符号来拟合mgcv::gam。在模型调用中使用符号时，我偶然发现了一件事情mgcv::s。带有可复制示例/错误的代码如下所示。

原因可能是因为我在模型公式中使用了这种表示法，但是我无法弄清楚为什么它不起作用/不允许这样做。这可能是关于语法的非常具体的东西（我想可能不是特定于mgcv的东西），但是也许有人可以帮助我理解这一点以及我对R的理解。预先感谢您。

library(mgcv)
dat <- data.frame(x = 1:10, y = 101:110)
# this results in an error: invalid type (list)...
mgcv::gam(y ~ mgcv::s(x, bs = "cs", k = -1), data = dat)
# after removing the mgcv:: in front of s everything works fine
mgcv::gam(y ~ s(x, bs = "cs", k = -1), data = dat)

# outside of the model call, both calls return the desired function
class(s)
# [1] "function"
class(mgcv::s)
# [1] "function"

Run Code Online (Sandbox Code Playgroud)

Answer 1

李哲源*_*李哲源 5

解释

library(mgcv)
#Loading required package: nlme
#This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.

f1 <- ~ s(x, bs = 'cr', k = -1)
f2 <- ~ mgcv::s(x, bs = 'cr', k = -1)

OK <- mgcv:::interpret.gam0(f1)$smooth.spec
FAIL <- mgcv:::interpret.gam0(f2)$smooth.spec

str(OK)
# $ :List of 10
#  ..$ term   : chr "x"
#  ..$ bs.dim : num -1
#  ..$ fixed  : logi FALSE
#  ..$ dim    : int 1
#  ..$ p.order: logi NA
#  ..$ by     : chr "NA"
#  ..$ label  : chr "s(x)"
#  ..$ xt     : NULL
#  ..$ id     : NULL
#  ..$ sp     : NULL
#  ..- attr(*, "class")= chr "cr.smooth.spec"

str(FAIL)
# list()

Run Code Online (Sandbox Code Playgroud)

源代码第4行interpret.gam0揭示了这个问题：

head(mgcv:::interpret.gam0)

1 function (gf, textra = NULL, extra.special = NULL)              
2 {                                                               
3     p.env <- environment(gf)                                    
4     tf <- terms.formula(gf, specials = c("s", "te", "ti", "t2", 
5         extra.special))                                         
6     terms <- attr(tf, "term.labels")

Run Code Online (Sandbox Code Playgroud)

由于"mgcv::s"不匹配，你就会遇到问题。但确实允许您通过参数mgcv传递来解决这个问题："mgcv::s"extra.special

FIX <- mgcv:::interpret.gam0(f, extra.special = "mgcv::s")$smooth.spec
all.equal(FIX, OK)
# [1] TRUE

Run Code Online (Sandbox Code Playgroud)

只是这在高级例程中不是用户可控的：

head(mgcv::gam, n = 10)

#1  function (formula, family = gaussian(), data = list(), weights = NULL, 
#2      subset = NULL, na.action, offset = NULL, method = "GCV.Cp",        
#3      optimizer = c("outer", "newton"), control = list(), scale = 0,     
#4      select = FALSE, knots = NULL, sp = NULL, min.sp = NULL, H = NULL,  
#5      gamma = 1, fit = TRUE, paraPen = NULL, G = NULL, in.out = NULL,    
#6      drop.unused.levels = TRUE, drop.intercept = NULL, ...)             
#7  {                                                                      
#8      control <- do.call("gam.control", control)                         
#9      if (is.null(G)) {                                                  
#10         gp <- interpret.gam(formula)  ## <- default to extra.special = NULL

Run Code Online (Sandbox Code Playgroud)

我同意本·博尔克的观点。挖掘内部发生的情况是一个很好的练习，但将其视为错误并修复它是一种过度反应。

更多见解：

s、te等 in与andmgcv的逻辑不同。stats::polysplines::bs

例如，当您执行时X <- splines::bs(x, df = 10, degree = 3)，它会直接评估 x并创建设计矩阵X。
当您这样做时s(x, bs = 'cr', k = 10)，不会进行任何评估；它被解析了。

顺利施工mgcv需要几个阶段：

解析/解释mgcv::interpret.gam，生成更平滑的配置文件；
初始构建mgcv::smooth.construct，设置基础/设计矩阵和惩罚矩阵（主要在 C 级完成）；
二次构造 by mgcv::smoothCon，它拾取“by”变量（例如，复制因子“by”的平滑）、线性函数项、零空间惩罚（如果使用select = TRUE）、惩罚重新缩放、居中约束等；
最终积分mgcv:::gam.setup，将所有平滑器组合在一起，返回模型矩阵等。

因此，这是一个更加复杂的过程。

归档时间：	7 年，2 月前
查看次数：	551 次
最近记录：	7 年，2 月前