如何从决策树计算错误率?

teo*_*389 31 r classification decision-tree rpart

有谁知道如何用R计算决策树的错误率?我正在使用该rpart()功能.

chl*_*chl 51

假设您的意思是计算用于拟合模型的样本的错误率,您可以使用printcp().例如,使用在线示例,

> library(rpart)
> fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
> printcp(fit)

Classification tree:
rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)

Variables actually used in tree construction:
[1] Age   Start

Root node error: 17/81 = 0.20988

n= 81 

        CP nsplit rel error  xerror    xstd
1 0.176471      0   1.00000 1.00000 0.21559
2 0.019608      1   0.82353 0.82353 0.20018
3 0.010000      4   0.76471 0.82353 0.20018
Run Code Online (Sandbox Code Playgroud)

所述Root node error用于计算的预测性能两种措施,考虑所显示的值时,rel errorxerror列,并且根据复杂参数(第一列):

  • 0.76471 x 0.20988 = 0.1604973(16.0%)是重新替代错误率(即在训练样本上计算的错误率) - 这大致是

    class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis)
    1-sum(diag(class.pred))/sum(class.pred)
    
    Run Code Online (Sandbox Code Playgroud)
  • 0.82353 X 0.20988 = 0.1728425(17.2%)是交叉验证误差率(使用10倍CV,看到xvalrpart.control();但也参见xpred.rpart()并且plotcp()其依赖于这种测量的).该度量是预测准确性的更客观指标.

请注意,它或多或少与分类准确性一致tree:

> library(tree)
> summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis))

Classification tree:
tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis)
Number of terminal nodes:  10 
Residual mean deviance:  0.5809 = 41.24 / 71 
Misclassification error rate: 0.1235 = 10 / 81 
Run Code Online (Sandbox Code Playgroud)

其中Misclassification error rate从所述训练样本计算的.