use*_*309 5 r decision-tree rpart
我有一个包含14个功能的数据集,其中很少有如下所示,其中性别和婚姻状况是分类变量.
height,sex,maritalStatus,age,edu,homeType
SEX
1. Male
2. Female
MARITAL STATUS
1. Married
2. Living together, not married
3. Divorced or separated
4. Widowed
5. Single, never married
Run Code Online (Sandbox Code Playgroud)
现在我使用R中的rpart库来使用以下内容构建分类树
rfit = rpart(homeType ~., data = trainingData, method = "class", cp = 0.0001)
Run Code Online (Sandbox Code Playgroud)
这给了我一个不将性别和婚姻状况视为因素的决策树.
我正在考虑使用as.factor:
sex = as.factor(trainingData$sex)
ms = as.factor(trainingData$maritalStatus)
Run Code Online (Sandbox Code Playgroud)
但我不知道如何将此信息传递给rpart.由于rpart()中的data参数接受"trainingData"数据帧.它将始终采用此数据框中的值.我对R来说不是新手,我很感激有人的帮助.
您可以trainingData直接对数据框进行更改,然后运行rpart().
trainingData$sex = as.factor(trainingData$sex)
trainingData$maritalStatus = as.factor(trainingData$maritalStatus)
rfit = rpart(homeType ~., data = trainingData, method = "class", cp = 0.0001)
Run Code Online (Sandbox Code Playgroud)