python 中使用 SVM 进行机器学习的分类报告测试集出错

Question

python 中使用 SVM 进行机器学习的分类报告测试集出错

Cro*_*sus 3 python machine-learning svm scikit-learn

我将数据分为测试集和训练集，它们的目标值均为“0”和“1”。但在使用 SVM 进行拟合和预测后，分类报告指出测试样本中存在零个“0”，这是不正确的。

from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
df = pd.DataFrame(data = data['data'],columns=data['feature_names'])
x = df
y = data['target']
xtrain,xtest,ytrain,ytest 
= train_test_split(x,y,test_size=0.3,random_state=42)

Run Code Online (Sandbox Code Playgroud)

如下所示，测试有 0 和 1，但分类报告中的支持表明不存在任何 0！

!( https://i.stack.imgur.com/n2uUM.png )

Answer 1

des*_*aut 6

（最好将相关代码包含在示例中，而不是图像中）

分类报告指出测试样本中有零个“0”，但这是不正确的。

classification_report这是因为，从链接图像中的代码来看，您已经切换了;中的参数。您已经使用过：

print(classification_report(pred, ytest)) # wrong order of arguments

Run Code Online (Sandbox Code Playgroud)

这确实给出了：

             precision    recall  f1-score   support

    class 0       0.00      0.00      0.00         0
    class 1       1.00      0.63      0.77       171

avg / total       1.00      0.63      0.77       171

Run Code Online (Sandbox Code Playgroud)

但正确的用法（参见文档）是

print(classification_report(ytest, pred)) # ytest first

Run Code Online (Sandbox Code Playgroud)

这使

             precision    recall  f1-score   support

    class 0       0.00      0.00      0.00        63
    class 1       0.63      1.00      0.77       108

avg / total       0.40      0.63      0.49       171

Run Code Online (Sandbox Code Playgroud)

以及以下警告消息：

C:\Users\Root\Anaconda3\envs\tensorflow1\lib\site-packages\sklearn\metrics\classification.py:1135: UndefinedMetricWarning: 精度和 F 分数定义不明确，在没有预测的标签中设置为 0.0样品。'精度'、'预测'、平均值、warn_for)

因为，正如评论中已经指出的，您仅预测 1：

pred
# result:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Run Code Online (Sandbox Code Playgroud)

其原因是另一个故事，而不是当前问题的一部分。

这是上述完整的可重现代码：

from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
xtrain,xtest,ytrain,ytest = train_test_split(X,y,test_size=0.3,random_state=42)

from sklearn.svm import SVC
svc=SVC()
svc.fit(xtrain, ytrain)
pred = svc.predict(xtest)

print(classification_report(ytest, pred))

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，10 月前
查看次数：	4316 次
最近记录：	6 年，10 月前