如何计算Scikit中多类分类的混淆矩阵?

YNr*_*YNr 8 python classification confusion-matrix scikit-learn

我有一个多类分类任务.当我基于scikit示例运行我的脚本时如下:

classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))

y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)
Run Code Online (Sandbox Code Playgroud)

我收到此错误:

File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported
Run Code Online (Sandbox Code Playgroud)

我试图传递labels=classifier.classes_confusion_matrix(),但它没有帮助.

y_test和y_pred如下:

y_test =
array([[0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0],
   [0, 1, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0]])


y_pred = 
array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 0]])
Run Code Online (Sandbox Code Playgroud)

Nao*_*man 7

首先,您需要创建标签输出数组.假设你有3个类:'cat','dog','house'索引:0,1,2.对2个样本的预测是:'dog','house'.你的输出将是:

y_pred = [[0, 1, 0],[0, 0, 1]]
Run Code Online (Sandbox Code Playgroud)

运行y_pred.argmax(1)得到:[1,2]这个数组代表原始标签索引,意思是:['dog','house']

num_classes = 3

# from lable to categorial
y_prediction = np.array([1,2]) 
y_categorial = np_utils.to_categorical(y_prediction, num_classes)

# from categorial to lable indexing
y_pred = y_categorial.argmax(1)
Run Code Online (Sandbox Code Playgroud)


ak2*_*205 7

这为我工作:

y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]

from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)
Run Code Online (Sandbox Code Playgroud)

其中y_testy_predict是分类变量,例如一键向量。