Cen*_*tAu 15 python machine-learning scikit-learn
是否有内置的方法可以分别获得每个班级的准确度分数?我知道在sklearn中我们可以通过使用获得整体准确性metric.accuracy_score.有没有办法获得个别班级的准确度分数?类似的东西metrics.classification_report.
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
Run Code Online (Sandbox Code Playgroud)
classification_report 不给出准确度分数:
print(classification_report(y_true, y_pred, target_names=target_names, digits=4))
Out[9]: precision recall f1-score support
class 0 0.5000 1.0000 0.6667 1
class 1 0.0000 0.0000 0.0000 1
class 2 1.0000 0.6667 0.8000 3
avg / total 0.7000 0.6000 0.6133 5
Run Code Online (Sandbox Code Playgroud)
准确度分数仅给出整体准确度:
accuracy_score(y_true, y_pred)
Out[10]: 0.59999999999999998
Run Code Online (Sandbox Code Playgroud)
jav*_*vac 15
from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
matrix = confusion_matrix(y_true, y_pred)
matrix.diagonal()/matrix.sum(axis=1)
Run Code Online (Sandbox Code Playgroud)
我添加我的答案是因为我在网上没有找到这个确切问题的任何答案,并且因为我认为我之前建议的其他计算方法是不正确的。
请记住,准确性定义为:
accuracy = (true_positives + true_negatives) / all_samples
Run Code Online (Sandbox Code Playgroud)
或者用文字表达出来;它是正确分类的示例(正面或负面)数量与测试集中示例总数之间的比率。
需要注意的一件事是,对于 TN 和 FN,“负”是与类别无关的,意味着“未预测为所讨论的特定类别”。例如,请考虑以下情况:
y_true = ['cat', 'dog', 'bird', 'bird']
y_pred = ['cat', 'dog', 'cat', 'dog']
Run Code Online (Sandbox Code Playgroud)
在这里,第二个“猫”预测和第二个“狗”预测都是假阴性,因为它们不是“鸟”。
对于你的问题:
据我所知,目前没有包提供满足您所需的方法,但根据准确性的定义,我们可以使用 sklearn 中的混淆矩阵方法自己计算。
from sklearn.metrics import confusion_matrix
import numpy as np
# Get the confusion matrix
cm = confusion_matrix(y_true, y_pred)
# We will store the results in a dictionary for easy access later
per_class_accuracies = {}
# Calculate the accuracy for each one of our classes
for idx, cls in enumerate(classes):
# True negatives are all the samples that are not our current GT class (not the current row)
# and were not predicted as the current class (not the current column)
true_negatives = np.sum(np.delete(np.delete(cm, idx, axis=0), idx, axis=1))
# True positives are all the samples of our current GT class that were predicted as such
true_positives = cm[idx, idx]
# The accuracy for the current class is the ratio between correct predictions to all predictions
per_class_accuracies[cls] = (true_positives + true_negatives) / np.sum(cm)
Run Code Online (Sandbox Code Playgroud)
最初的问题是不久前发布的,但这可能会对像我这样通过谷歌来到这里的人有所帮助。
您可以使用sklearn的混淆矩阵来获得准确性
from sklearn.metrics import confusion_matrix
import numpy as np
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
#Get the confusion matrix
cm = confusion_matrix(y_true, y_pred)
#array([[1, 0, 0],
# [1, 0, 0],
# [0, 1, 2]])
#Now the normalize the diagonal entries
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
#array([[1. , 0. , 0. ],
# [1. , 0. , 0. ],
# [0. , 0.33333333, 0.66666667]])
#The diagonal entries are the accuracies of each class
cm.diagonal()
#array([1. , 0. , 0.66666667])
Run Code Online (Sandbox Code Playgroud)
参考
你可以自己编码:准确率无非是分类良好的样本(真阳性和真阴性)与你拥有的样本总数之间的比率。
然后,对于给定的类,您只需考虑您所在类的样本,而不是考虑所有样本。
然后您可以尝试以下操作:让我们首先定义一个方便的函数。
def indices(l, val):
retval = []
last = 0
while val in l[last:]:
i = l[last:].index(val)
retval.append(last + i)
last += i + 1
return retval
Run Code Online (Sandbox Code Playgroud)
上面的函数将返回列表l中某个值val的索引
def class_accuracy(y_pred, y_true, class):
index = indices(l, class)
y_pred, y_true = ypred[index], y_true[index]
tp = [1 for k in range(len(y_pred)) if y_true[k]==y_pred[k]]
tp = np.sum(tp)
return tp/float(len(y_pred))
Run Code Online (Sandbox Code Playgroud)
最后一个函数将返回您所寻找的类内准确度。
| 归档时间: |
|
| 查看次数: |
10374 次 |
| 最近记录: |