smooth.spline():拟合模型与用户指定的自由度不匹配

Question

smooth.spline():拟合模型与用户指定的自由度不匹配

cya*_*yan 5 regression r spline smoothing

这是我运行的代码

fun <- function(x) {1 + 3*sin(4*pi*x-pi)}
set.seed(1)
num.samples <- 1000
x <- runif(num.samples)
y <- fun(x) + rnorm(num.samples) * 1.5
fit <- smooth.spline(x, y, all.knots=TRUE, df=3)

Run Code Online (Sandbox Code Playgroud)

尽管如此df=3,当我检查拟合模型时,输出是

Call:
smooth.spline(x = x, y = y, df = 3, all.knots = TRUE)
Smoothing Parameter  spar= 1.499954  lambda= 0.002508571 (26 iterations)
Equivalent Degrees of Freedom (Df): 9.86422

Run Code Online (Sandbox Code Playgroud)

有人可以帮忙吗？谢谢!

Answer 1

李哲源*_*李哲源 5

请注意,从R-3.4.0(2017-04-21),smooth.spline可以接受?新添加的参数的直接指定lambda.但spar在估算期间它仍会转换为内部的.所以以下答案不受影响.

平滑参数?/ spar位于平滑度控制的中心

平滑度由平滑参数控制?.smooth.spline()使用内部平滑参数spar而不是?:

spar = s0 + 0.0601 * log(?)

Run Code Online (Sandbox Code Playgroud)

这种对数变换对于进行无约束最小化是必要的,如GCV/CV.用户可以指定spar间接指定?.当spar线性?增长时,将呈指数增长.因此很少需要使用大spar价值.

自由度df也定义为?:

其中X是具有B样条基的模型矩阵,并且S是惩罚矩阵.

您可以检查它们与数据集的关系:

spar <- seq(1, 2.5, by = 0.1)
a <- sapply(spar, function (spar_i) unlist(smooth.spline(x, y, all.knots=TRUE, spar = spar_i)[c("df","lambda")]))

Run Code Online (Sandbox Code Playgroud)

让我们草绘一下df ~ spar,? ~ spar并且log(?) ~ spar:

par(mfrow = c(1,3))
plot(spar, a[1, ], type = "b", main = "df ~ spar",
     xlab = "spar", ylab = "df")
plot(spar, a[2, ], type = "b", main = "lambda ~ spar",
     xlab = "spar", ylab = "lambda")
plot(spar, log(a[2,]), type = "b", main = "log(lambda) ~ spar",
     xlab = "spar", ylab = "log(lambda)")

Run Code Online (Sandbox Code Playgroud)

注意的自由基生长?用spar,之间的线性关系log(?)和spar之间,以及在相对平滑的关系df和spar.

smooth.spline() 拟合迭代 spar

如果我们手动指定值spar,就像我们在其中所做的那样sapply(),没有完成选择的拟合迭代spar; 否则smooth.spline()需要迭代许多spar值.要是我们

指定cv = TRUE / FALSE,拟合迭代旨在最小化CV/GCV得分;
指定df = mydf,拟合迭代旨在最小化(df(spar) - mydf) ^ 2.

最小化GCV很容易遵循.我们不关心GCV得分,但关心相应的spar.相反,在最小化时(df(spar) - mydf)^2,我们经常关心df迭代结束时的值而不是spar!但请记住,这是一个最小化问题,我们永远不能保证最终df匹配我们的目标值mydf.

为什么你放df = 3,但得到df = 9.864?

迭代结束可能意味着达到最小值,或达到搜索边界,或达到最大迭代次数.

我们远离最大迭代限制(默认为500); 但我们没有达到最低限度.好吧,我们可能会达到边界.

不要专注df,思考spar.

smooth.spline(x, y, all.knots=TRUE, df=3)$spar   # 1.4999

Run Code Online (Sandbox Code Playgroud)

根据?smooth.spline,默认情况下,smooth.spline()搜索spar之间[-1.5, 1.5].即,当你放置时df = 3,最小化终止于搜索边界,而不是击中df = 3.

看看我们之间的关系曲线df和spar,再次.从图中可以看出spar,为了得到结果,我们需要一些接近2的值df = 3.

让我们使用control.spar参数:

fit <- smooth.spline(x, y, all.knots=TRUE, df=3, control.spar = list(high = 2.5))
# Smoothing Parameter  spar= 1.859066  lambda= 0.9855336 (14 iterations)
# Equivalent Degrees of Freedom (Df): 3.000305

Run Code Online (Sandbox Code Playgroud)

现在你看,你最终得到了df = 3.我们需要一个spar = 1.86.

更好的建议:不要使用 all.knots = TRUE

看,你有1000个数据.随着all.knots = TRUE您将使用1000点的参数.希望最终结果df = 3意味着1000个参数中的997个被抑制.试想一下,一个大的?,因此spar,你需要的!

请尝试使用惩罚回归样条.将200个参数抑制为3肯定要容易得多:

fit <- smooth.spline(x, y, nknots = 200, df=3)  ## using 200 knots
# Smoothing Parameter  spar= 1.317883  lambda= 0.9853648 (16 iterations)
# Equivalent Degrees of Freedom (Df): 3.000386

Run Code Online (Sandbox Code Playgroud)

现在,你最终df = 3没有spar控制权.

归档时间：	10 年，1 月前
查看次数：	640 次
最近记录：	9 年前