使用连续变量的决策树

Question

使用连续变量的决策树

BSK*_*Kim 7 r machine-learning cart decision-tree rpart

我有一个关于使用连续变量的决策树的问题

我听说当输出变量是连续的而输入变量是分类的时，分割标准是减少方差或其他东西。但如果输入变量是连续的，我不知道它是如何工作的

输入变量：连续/输出变量：分类
输入变量：连续/输出变量：连续

关于两种情况，我们如何获得像基尼指数或信息增益这样的拆分标准？

当我在 R 中使用 rpart 时，无论输入变量和输出变量都运行良好，但我不知道详细的算法。

Answer 1

Vit*_*ang 7

1) input variable : continuous / output variable : categorical
C4.5 algorithm solve this situation. C4.5

In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it.

2) input variable : continuous / output variable : continuous
CART(classification and regression trees) algorithm solves this situation. CART

Case 2 is the regression problem. You should enumerate the attribute j, and enumerate the values s in that attribute, and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Then you get two areas

Find the best attribute j and the best split value s, which

c_1 and c_2 and be solved as follows:

Then when do regression,

where

归档时间：	8 年，9 月前
查看次数：	15374 次
最近记录：	4 年，1 月前