Scikit-learn,获得每个班级的准确性分数

Question

Scikit-learn,获得每个班级的准确性分数

Cen*_*tAu 15 python machine-learning scikit-learn

是否有内置的方法可以分别获得每个班级的准确度分数？我知道在sklearn中我们可以通过使用获得整体准确性metric.accuracy_score.有没有办法获得个别班级的准确度分数？类似的东西metrics.classification_report.

from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']

Run Code Online (Sandbox Code Playgroud)

classification_report 不给出准确度分数:

print(classification_report(y_true, y_pred, target_names=target_names, digits=4))

Out[9]:         precision    recall  f1-score   support

class 0     0.5000    1.0000    0.6667         1
class 1     0.0000    0.0000    0.0000         1
class 2     1.0000    0.6667    0.8000         3

avg / total     0.7000    0.6000    0.6133         5

Run Code Online (Sandbox Code Playgroud)

准确度分数仅给出整体准确度:

accuracy_score(y_true, y_pred)
Out[10]: 0.59999999999999998

Run Code Online (Sandbox Code Playgroud)

Answer 1

jav*_*vac 15

from sklearn.metrics import confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
matrix = confusion_matrix(y_true, y_pred)
matrix.diagonal()/matrix.sum(axis=1)

Run Code Online (Sandbox Code Playgroud)

我相信这段代码可能不正确。`sklearn.metrics.confusion_matrix` 将行作为实际的类。所以我们应该使用 `axis=0` 来代替。 (8认同)
详细说明，假设列（因此此答案的“轴= 1”）代表实际类别，而行代表预测类别，类别 i 的准确度是第 i 列总和上的混淆矩阵的 ii 元素。数学就是这样计算的。 (2认同)
这是不正确的，因为这是计算召回率（如果“sum(axis=1)”）或精度（如果“sum(axis=0)”）。Ophir 的答案应该是正确的（/sf/answers/4597111151/）。 (2认同)

Answer 2

Oph*_*r S 9

我添加我的答案是因为我在网上没有找到这个确切问题的任何答案，并且因为我认为我之前建议的其他计算方法是不正确的。

请记住，准确性定义为：

accuracy = (true_positives + true_negatives) / all_samples

Run Code Online (Sandbox Code Playgroud)

或者用文字表达出来；它是正确分类的示例（正面或负面）数量与测试集中示例总数之间的比率。

需要注意的一件事是，对于 TN 和 FN，“负”是与类别无关的，意味着“未预测为所讨论的特定类别”。例如，请考虑以下情况：

y_true = ['cat', 'dog', 'bird', 'bird']
y_pred = ['cat', 'dog', 'cat', 'dog']

Run Code Online (Sandbox Code Playgroud)

在这里，第二个“猫”预测和第二个“狗”预测都是假阴性，因为它们不是“鸟”。

对于你的问题：

据我所知，目前没有包提供满足您所需的方法，但根据准确性的定义，我们可以使用 sklearn 中的混淆矩阵方法自己计算。

from sklearn.metrics import confusion_matrix
import numpy as np

# Get the confusion matrix
cm = confusion_matrix(y_true, y_pred)

# We will store the results in a dictionary for easy access later
per_class_accuracies = {}

# Calculate the accuracy for each one of our classes
for idx, cls in enumerate(classes):
    # True negatives are all the samples that are not our current GT class (not the current row) 
    # and were not predicted as the current class (not the current column)
    true_negatives = np.sum(np.delete(np.delete(cm, idx, axis=0), idx, axis=1))
    
    # True positives are all the samples of our current GT class that were predicted as such
    true_positives = cm[idx, idx]
    
    # The accuracy for the current class is the ratio between correct predictions to all predictions
    per_class_accuracies[cls] = (true_positives + true_negatives) / np.sum(cm)

Run Code Online (Sandbox Code Playgroud)

最初的问题是不久前发布的，但这可能会对像我这样通过谷歌来到这里的人有所帮助。

Answer 3

Moh*_*hif 7

您可以使用sklearn的混淆矩阵来获得准确性

from sklearn.metrics import confusion_matrix
import numpy as np

y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']

#Get the confusion matrix
cm = confusion_matrix(y_true, y_pred)
#array([[1, 0, 0],
#   [1, 0, 0],
#   [0, 1, 2]])

#Now the normalize the diagonal entries
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
#array([[1.        , 0.        , 0.        ],
#      [1.        , 0.        , 0.        ],
#      [0.        , 0.33333333, 0.66666667]])

#The diagonal entries are the accuracies of each class
cm.diagonal()
#array([1.        , 0.        , 0.66666667])

Run Code Online (Sandbox Code Playgroud)

参考

plot Confusion matrix sklearn

对于多类分类，每类准确率与每类召回率相同。 (9认同)
这不是准确性,而是召回. (6认同)
提示：对于 sklearn>=0.22 使用 `confusion_matrix(..., normalize="true").diagonal()` 直接计算每个类的准确度。 (2认同)
@strohne好像混淆矩阵不够混乱，不要让它变得更糟:)上面正确计算了每类的准确度，即每类正确分类的样本的比率。召回率是正类的每类准确率，不应与总体准确率（所有类中正确预测的比率）混淆。总体精度可以通过 `confusion_matrix(..., normalize="all").diagonal().sum()` 来计算。 (2认同)

Answer 4

MMF*_*MMF 5

你可以自己编码：准确率无非是分类良好的样本（真阳性和真阴性）与你拥有的样本总数之间的比率。

然后，对于给定的类，您只需考虑您所在类的样本，而不是考虑所有样本。

然后您可以尝试以下操作：让我们首先定义一个方便的函数。

def indices(l, val):
   retval = []
   last = 0
   while val in l[last:]:
           i = l[last:].index(val)
           retval.append(last + i)
           last += i + 1   
   return retval

Run Code Online (Sandbox Code Playgroud)

上面的函数将返回列表l中某个值val的索引

def class_accuracy(y_pred, y_true, class): index = indices(l, class) y_pred, y_true = ypred[index], y_true[index] tp = [1 for k in range(len(y_pred)) if y_true[k]==y_pred[k]] tp = np.sum(tp) return tp/float(len(y_pred))
Run Code Online (Sandbox Code Playgroud)
最后一个函数将返回您所寻找的类内准确度。

归档时间：	9 年，3 月前
查看次数：	10374 次
最近记录：	6 年，6 月前