nir*_*jan 6 python precision machine-learning scikit-learn multilabel-classification
我为我的多类多标签输出变量运行了随机森林分类器.我得到了以下输出.
My y_test values
Degree Nature
762721 1 7
548912 0 6
727126 1 12
14880 1 12
189505 1 12
657486 1 12
461004 1 0
31548 0 6
296674 1 7
121330 0 17
predicted output :
[[ 1. 7.]
[ 0. 6.]
[ 1. 12.]
[ 1. 12.]
[ 1. 12.]
[ 1. 12.]
[ 1. 0.]
[ 0. 6.]
[ 1. 7.]
[ 0. 17.]]
Run Code Online (Sandbox Code Playgroud)
现在我想检查分类器的性能.我发现对于多类多标签"Hamming loss或jaccard_similarity_score"是很好的指标.我试图计算它,但我得到了价值错误.
Error:
ValueError: multiclass-multioutput is not supported
Run Code Online (Sandbox Code Playgroud)
我尝试下面的线:
print hamming_loss(y_test, RF_predicted)
print jaccard_similarity_score(y_test, RF_predicted)
Run Code Online (Sandbox Code Playgroud)
谢谢,
要计算多类/多标签的不受支持的汉明损失,您可以:
import numpy as np
y_true = np.array([[1, 1], [2, 3]])
y_pred = np.array([[0, 1], [1, 2]])
np.sum(np.not_equal(y_true, y_pred))/float(y_true.size)
0.75
Run Code Online (Sandbox Code Playgroud)
您还可以confusion_matrix
像这样获取两个标签中的每一个:
from sklearn.metrics import confusion_matrix, precision_score
np.random.seed(42)
y_true = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T
[[0 4]
[1 4]
[0 4]
[0 4]
[0 2]
[1 4]
[0 3]
[0 2]
[0 3]
[1 3]]
y_pred = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T
[[1 2]
[1 2]
[1 4]
[1 4]
[0 4]
[0 3]
[1 4]
[1 3]
[1 3]
[0 4]]
confusion_matrix(y_true[:, 0], y_pred[:, 0])
[[1 6]
[2 1]]
confusion_matrix(y_true[:, 1], y_pred[:, 1])
[[0 1 1]
[0 1 2]
[2 1 2]]
Run Code Online (Sandbox Code Playgroud)
你也可以这样计算precision_score
(或recall_score
以类似的方式):
precision_score(y_true[:, 0], y_pred[:, 0])
0.142857142857
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
7615 次 |
最近记录: |