"响应"与地球的预测(MARS)和R中的插入符号

Mr.*_*cos 3 regression r r-caret

我希望这不是一个天真的问题.我caret在R 中的包中使用不同的模型执行一系列二项式回归.除了地球(MARS)之外,所有这些都是有效的.通常,earth系列通过glm函数传递给earth函数glm=list(family=binomial).这似乎工作正常(如下所示).对于一般predict()功能,我会使用它type="response'来正确地缩放预测.以下示例显示了fit1使用正确预测的非插入符方法pred1. pred1a是没有的不正确的缩放预测type='response'. fit2与该方法caretpred2是预测; 它与非缩放预测相同pred1a.通过fit2对象挖掘,glm.list组件中存在正确拟合的值.因此,该earth()函数表现得如此.

问题是......因为caret prediction()函数只需要type='prob' or 'raw',我如何指示是根据响应的规模进行预测?

非常感谢你.

require(earth)
library(caret)
data(mtcars)

fit1 <- earth(am ~ cyl + mpg + wt + disp, data = mtcars,
        degree=1, glm=list(family=binomial))
pred1 <- predict(fit1, newdata = mtcars, type="response")
range(pred1)
[1] 0.0004665284 0.9979135993 # Correct - binomial with response

pred1a <- predict(fit1, newdata = mtcars)
range(pred1a)
[1] -7.669725  6.170226 # without "response"

fit2ctrl <- trainControl(method = "cv", number = 5)
fit2 <- train(am ~ cyl + mpg + wt + disp, data = mtcars, method = "earth", 
         trControl = fit2ctrl, tuneLength = 3,
        glm=list(family='binomial'))
pred2 <- predict(fit2, newdata = mtcars)
range(pred2)
[1] -7.669725  6.170226 # same as pred1a

#within glm.list object in fit4
[1] 0.0004665284 0.9979135993
Run Code Online (Sandbox Code Playgroud)

top*_*epo 8

有几件事:

  • 结果(mtcars$am)是数字0/1,train并将其视为回归模型
  • 当结果是一个因素时,train将采用分类并自动添加glm=list(family=binomial)
  • 与分类train,你将需要添加classProbs = TRUEtrainControl为模型制作类的概率.

以下是earth包中不同数据集的示例:

library(earth)
library(caret)

data(etitanic)

a1 <- earth(survived ~ ., 
            data = etitanic,
            glm=list(family=binomial),
            degree = 2,       
            nprune = 5)

etitanic$survived <- factor(ifelse(etitanic$survived == 1, "yes", "no"),
                            levels = c("yes", "no"))

a2 <- train(survived ~ ., 
            data = etitanic, 
            method = "earth",
            tuneGrid = data.frame(degree = 2, nprune = 5),
            trControl = trainControl(method = "none", 
                                     classProbs = TRUE))
Run Code Online (Sandbox Code Playgroud)

然后:

> predict(a1, head(etitanic), type = "response")
      survived
[1,] 0.8846552
[2,] 0.9281010
[3,] 0.8846552
[4,] 0.4135716
[5,] 0.8846552
[6,] 0.4135716
> 
> predict(a2, head(etitanic), type = "prob")
        yes         no
1 0.8846552 0.11534481
2 0.9281010 0.07189895
3 0.8846552 0.11534481
4 0.4135716 0.58642840
5 0.8846552 0.11534481
6 0.4135716 0.58642840
Run Code Online (Sandbox Code Playgroud)

马克斯

  • @Max,感谢您的明确回答和示例。很有帮助! (2认同)