BSK*_*Kim 7 r machine-learning cart decision-tree rpart
我有一个关于使用连续变量的决策树的问题
我听说当输出变量是连续的而输入变量是分类的时,分割标准是减少方差或其他东西。但如果输入变量是连续的,我不知道它是如何工作的
输入变量:连续/输出变量:分类
输入变量:连续/输出变量:连续
关于两种情况,我们如何获得像基尼指数或信息增益这样的拆分标准?
当我在 R 中使用 rpart 时,无论输入变量和输出变量都运行良好,但我不知道详细的算法。
1) input variable : continuous / output variable : categorical
C4.5 algorithm solve this situation.
C4.5
In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.
2) input variable : continuous / output variable : continuous
CART(classification and regression trees) algorithm solves this situation. CART
Case 2 is the regression problem. You should enumerate the attribute j
, and enumerate the values s
in that attribute, and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Then you get two areas
Find the best attribute j
and the best split value s
, which
c_1
and c_2
and be solved as follows:
where