Dis*_*tty 5 r classification machine-learning
尝试使用 RandomForest 预测模型的准确性,但遇到以下错误。
错误:data和reference应该是水平相同的因素。
这是以下代码
rfModel <- randomForest(Churn ~., data = training)
print(rfModel)
pred_rf <- predict(rfModel, testing)
caret::confusionMatrix(pred_rf, testing$Churn)
testing$Churn
Run Code Online (Sandbox Code Playgroud)
训练和测试数据按 7:3 的比例分割
运行代码时也收到以下警告
Warning messages:
1: In get(results[[i]], pos = which(search() == packages[[i]])) :
restarting interrupted promise evaluation
2: In get(results[[i]], pos = which(search() == packages[[i]])) :
internal error -3 in R_decompress1
Run Code Online (Sandbox Code Playgroud)
测试数据结构
str(testing)
'data.frame': 999 obs. of 18 variables:
$ account_length : int 84 75 147 141 65 62 85 93 76 73 ...
$ International.plan : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 1 1 1 1 1 ...
$ Voice.mail.plan : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 2 1 2 1 ...
$ Number.vmail.messages : int 0 0 0 37 0 0 27 0 33 0 ...
$ Total.day.minutes : num 299 167 157 259 129 ...
$ Total.day.calls : int 71 113 79 84 137 70 139 114 66 90 ...
$ Total.day.charge : num 50.9 28.3 26.7 44 21.9 ...
$ Total.eve.minutes : num 61.9 148.3 103.1 222 228.5 ...
$ Total.eve.calls : int 88 122 94 111 83 76 90 111 65 88 ...
$ Total.eve.charge : num 5.26 12.61 8.76 18.87 19.42 ...
$ Total.night.minutes : num 197 187 212 326 209 ...
$ Total.night.calls : int 89 121 96 97 111 99 75 121 108 74 ...
$ Total.night.charge : num 8.86 8.41 9.53 14.69 9.4 ...
$ Total.intl.minutes : num 6.6 10.1 7.1 11.2 12.7 13.1 13.8 8.1 10 13 ...
$ Total.intl.calls : int 7 3 6 5 6 6 4 3 5 2 ...
$ Total.intl.charge : num 1.78 2.73 1.92 3.02 3.43 3.54 3.73 2.19 2.7 3.51 ...
$ Customer.service.calls: int 2 3 0 0 4 4 1 3 1 1 ...
$ Churn : chr "0" "0" "0" "0" ...
Run Code Online (Sandbox Code Playgroud)
训练集结构相同,有2334个观察
pred_rf 的结构
str(pred_rf)
Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 2 2 1 1 1 1 ...
- attr(*, "names")= chr [1:999] "4" "5" "8" "10" ...
Run Code Online (Sandbox Code Playgroud)
请帮帮我。
好吧,我刚刚遇到了同样的问题并解决了。
看看你的str(testing),注意你的流失不是一个因素,而是一个因素。
首先,您需要将流失率设置为一个因素,
Churn <- as.factor(testing$Churn)
Run Code Online (Sandbox Code Playgroud)
再次检查一下,str(testing)看看它确实发生了变化。
现在您可以使用:
test_predictions = predict(rf_model, testing_set)
test_predictions
conf_matrix = confusionMatrix(test_predictions, Churn)
conf_matrix
Run Code Online (Sandbox Code Playgroud)
请参阅:https ://community.rstudio.com/t/how-to-deal-with-rlang-errors/27248
| 归档时间: |
|
| 查看次数: |
8869 次 |
| 最近记录: |