请考虑以下代码:
library(ISLR)
row_list <- structure(list(`1` = 1:40, `2` = 41:79, `3` = 80:118, `4` = 119:157,
`5` = 158:196, `6` = 197:235, `7` = 236:274, `8` = 275:313,
`9` = 314:352, `10` = 353:392),
.Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
test <- row_list[[1]]
train <- setdiff(unlist(row_list), row_list[[1]])
Run Code Online (Sandbox Code Playgroud)
> glm(mpg ~ poly(horsepower, 1), data = Auto, subset = train)
Call: glm(formula = mpg ~ poly(horsepower, 1), data = Auto, subset = train)
Coefficients:
(Intercept) poly(horsepower, 1)
23.37 -133.05
Degrees of Freedom: 351 Total (i.e. Null); 350 Residual
Null Deviance: 21460
Residual Deviance: 8421 AIC: 2122
Run Code Online (Sandbox Code Playgroud)
> glm(mpg ~ poly(horsepower, 1), data = Auto[train,])
Call: glm(formula = mpg ~ poly(horsepower, 1), data = Auto[train, ])
Coefficients:
(Intercept) poly(horsepower, 1)
24.05 -114.19
Degrees of Freedom: 351 Total (i.e. Null); 350 Residual
Null Deviance: 21460
Residual Deviance: 8421 AIC: 2122
Run Code Online (Sandbox Code Playgroud)
从上面可以看出,两个输出之间的(Intercept)和poly(horsepower, 1)值不同.为什么是这样?
至少lm(),介绍统计学习建议(见第191页),该行的索引可以在使用subset的说法.是不是这种情况glm(),或者subset只是没有正确使用?
这与正交多项式的构造方式有关poly.
在第一个示例中,它们是在子集化之前构造的,而在第二个示例中,首先进行子集化(当您将子集化数据传递给它时glm).
使用原始多项式可得到相同的结果:
coef(glm(mpg~poly(hp,1),data=mtcars,subset=10:32))
(Intercept) poly(hp, 1)
20.63307 -28.66876
coef(glm(mpg~poly(hp,1),data=mtcars[10:32,]))
(Intercept) poly(hp, 1)
19.93043 -25.43935
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars,subset=10:32))
(Intercept) poly(hp, 1, raw = TRUE)
31.64927851 -0.07509986
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars[10:32,]))
(Intercept) poly(hp, 1, raw = TRUE)
31.64927851 -0.07509986
Run Code Online (Sandbox Code Playgroud)