使用连续变量的决策树

BSK*_*Kim 7 r machine-learning cart decision-tree rpart

我有一个关于使用连续变量的决策树的问题

我听说当输出变量是连续的而输入变量是分类的时,分割标准是减少方差或其他东西。但如果输入变量是连续的,我不知道它是如何工作的

  1. 输入变量:连续/输出变量:分类

  2. 输入变量:连续/输出变量:连续

关于两种情况,我们如何获得像基尼指数或信息增益这样的拆分标准?

当我在 R 中使用 rpart 时,无论输入变量和输出变量都运行良好,但我不知道详细的算法。

Vit*_*ang 7

1) input variable : continuous / output variable : categorical
C4.5 algorithm solve this situation. C4.5

In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.

2) input variable : continuous / output variable : continuous
CART(classification and regression trees) algorithm solves this situation. CART

Case 2 is the regression problem. You should enumerate the attribute j, and enumerate the values s in that attribute, and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Then you get two areas 在此处输入图片说明

Find the best attribute j and the best split value s, which

在此处输入图片说明

c_1 and c_2 and be solved as follows:

在此处输入图片说明

Then when do regression,
在此处输入图片说明

where

在此处输入图片说明