从 geom_smooth 中提取模型拟合

Mel*_*ker 5 r ggplot2 gam

我有一个 ggplot,其中使用了 geom_smooth(method=\'gam\') 函数。我想知道是否有一种方法可以提取模型参数,例如解释的系数和偏差。

\n

类似于 mgcv::gam() 输出的 摘要():

\n
> summary(gam)\n\nFamily: gaussian \nLink function: identity \n\nFormula:\nmAODscale ~ s(numDate, bs = "cr")\n\nParametric coefficients:\n            Estimate Std. Error t value Pr(>|t|)    \n(Intercept) 0.041461   0.002198   18.86   <2e-16 ***\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1\n\nApproximate significance of smooth terms:\n             edf Ref.df     F p-value    \ns(numDate) 8.731  8.979 74.54  <2e-16 ***\n---\nSignif. codes:  0 \xe2\x80\x98***\xe2\x80\x99 0.001 \xe2\x80\x98**\xe2\x80\x99 0.01 \xe2\x80\x98*\xe2\x80\x99 0.05 \xe2\x80\x98.\xe2\x80\x99 0.1 \xe2\x80\x98 \xe2\x80\x99 1\n\nR-sq.(adj) =  0.379   Deviance explained = 38.4%\nGCV = 0.0053239  Scale est. = 0.0052765  n = 1092\n
Run Code Online (Sandbox Code Playgroud)\n

数据

\n
test <- structure(list(numDate = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, \n5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, \n11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, \n16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, \n22, 22, 22, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, \n27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31, 32, 32, \n32, 33, 33, 33, 34), mAODscale = c(0.0388813764134284, 0.0461877433707656, \n0.096886995092774, 0.0371438764134382, 0.0394419100374392, 0.0893994950927635, \n0.0413855430800965, 0.0418085767040992, 0.101328661759439, 0.0428313764134316, \n0.0403835767041016, 0.0978328284261067, 0.0444813764134295, 0.0566460767040979, \n0.136811995092771, 0.0404647097467716, 0.0461127433707702, 0.104611995092768, \n0.0391105430800991, 0.0388294100374367, 0.0883828284261057, 0.0377355430801032, \n0.0334335767040983, 0.0744786617594428, 0.0346647097467638, 0.0316752433707705, \n0.0691911617594343, 0.0365730430800966, 0.0329127433707725, 0.0721203284261094, \n0.0337897097467703, 0.0271960767041008, 0.0568703284261005, 0.0321188764134348, \n0.0226835767040967, 0.0467286617594311, 0.0389522097467676, 0.0317169100374315, \n0.0724911617594444, 0.0374147097467699, 0.0301335767041024, 0.066378661759444, \n0.0359688764134347, 0.0245710767041061, 0.0492828284261009, 0.0340355430801083, \n0.0208835767040938, 0.0401828284261114, 0.033193876413435, 0.0170627433707722, \n0.0363119950927739, 0.0320272097467722, 0.0147294100374324, 0.0298161617594417, \n0.0305563764134433, 0.0123752433707693, 0.0200744950927714, 0.0294522097467649, \n0.00966691003743847, 0.0144453284261061, 0.029439709746768, 0.00845441003743019, \n0.0127494950927769, 0.029218876413438, 0.00767941003742578, 0.0118203284260971, \n0.0283438764134303, 0.00608774337077023, 0.00807032842610056, \n0.027393235387791, 0.00498582029383954, 0.00612513611841337, \n0.0261313764134314, 0.004612743370771, 0.00541616175944171, 0.0260813764134298, \n0.00472524337077118, 0.00372449509276862, 0.0256522097467666, \n0.00535024337077061, 0.00356616175943714, 0.0250313764134376, \n0.0161044100374284, 0.00060366175944182, 0.0241147097467689, \n0.025246076704093, -0.00540050490722876, 0.0233022097467739, \n0.022454410037426, -0.00630883824055672, 0.0208772097467715, \n0.0152252433707645, -0.00819217157389573, 0.0194897097467646, \n-0.00307475662923196, -0.00896300490722979, 0.0178938764134386, \n0.00323774337076088, -0.00836717157389444, 0.0153022097467641\n)), row.names = c(NA, -100L), class = c("data.table", "data.frame"\n), .internal.selfref = <pointer: 0x000002cacef91f60>)\n
Run Code Online (Sandbox Code Playgroud)\n

使用 geom_smooth() 进行编码和绘图

\n

注意:这仅使用我的数据集的子集。

\n
p <- ggplot(test50, aes(x=numDate, y=mAODscale)) +\n  geom_point() +\n  stat_smooth(method=\'gam\',\n              formula=y~s(x,bs="cs",fx=TRUE,k=10))\n
Run Code Online (Sandbox Code Playgroud)\n

在此输入图像描述

\n

All*_*ron 5

geom_smooth在 中指定模型时ggplot,您可以在公式中使用符号xy来表示映射到 x 和 y 美学的变量(如 中指定的aes)。在你的情况下,xnumDate并且ymAODscale

由于在运行模型之前数据中没有进行任何转换(除了变量名称的更改),因此您所需要做的就是将公式中的符号xy替换为原始数据框中适当的变量名称。然后使用该公式运行gam. 该模型的摘要就是您正在寻找的结果:

library(mgcv)

mod <- gam(mAODscale ~ s(numDate, bs = "cs", fx = TRUE, k = 10), data = test)
summary(mod)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> mAODscale ~ s(numDate, bs = "cs", fx = TRUE, k = 10)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.031871   0.001839   17.33   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>            edf Ref.df     F  p-value    
#> s(numDate)   9      9 13.25 3.01e-13 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.527   Deviance explained =   57%
#> GCV = 0.00037584  Scale est. = 0.00033826  n = 100
Run Code Online (Sandbox Code Playgroud)

模型本身不存储在 ggplot 对象中。它是即时构建的,并在此ggplot_build过程中被丢弃。我们可以看到,如果我们设置on并让它在调用时将其输出对象的摘要打印到控制台,则直接调用会gam给出与所使用的模型相同的结果:ggplottracegam

trace(gam, quote(print(summary(object))), at = 40)
#> Tracing function "gam" in package "mgcv"
#> [1] "gam"
Run Code Online (Sandbox Code Playgroud)

gam现在,当我们打印 ggplot 时,我们应该得到用于ggplot创建平滑的摘要:

ggplot(test, aes(x=numDate, y=mAODscale)) +
  geom_point() +
  stat_smooth(method='gam',
              formula=y ~ s(x, bs = "cs", fx = TRUE, k = 10))
Run Code Online (Sandbox Code Playgroud)

#> Tracing method(formula, data = data, weights = weight, method = "REML") step 40 
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> y ~ s(x, bs = "cs", fx = TRUE, k = 10)
#> 
#> Parametric coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 0.031871   0.001839   17.33   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>      edf Ref.df     F  p-value    
#> s(x)   9      9 13.25 3.01e-13 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.527   Deviance explained =   57%
#> -REML = -220.56  Scale est. = 0.00033826  n = 100
Run Code Online (Sandbox Code Playgroud)

可以看到这和我们直接调用是一样的,只是变量名变成了xand y

untrace(gam)完成后不要忘记。

创建于 2023-10-03,使用reprex v2.0.2