scikit学习的特殊性

Question

scikit学习的特殊性

我需要specificity我的分类,定义为: TN/(TN+FP)

我正在写一个自定义得分手功能:

from sklearn.metrics import make_scorer
def specificity_loss_func(ground_truth, predictions):
    print predictions
    tp, tn, fn, fp = 0.0,0.0,0.0,0.0
    for l,m in enumerate(ground_truth):        
        if m==predictions[l] and m==1:
            tp+=1
        if m==predictions[l] and m==0:
            tn+=1
        if m!=predictions[l] and m==1:
            fn+=1
        if m!=predictions[l] and m==0:
            fp+=1
    `return tn/(tn+fp)

score = make_scorer(specificity_loss_func, greater_is_better=True)

Run Code Online (Sandbox Code Playgroud)

然后,

from sklearn.dummy import DummyClassifier
clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0)
ground_truth = [0,0,1,0,1,1,1,0,0,1,0,0,1]
p  = [0,0,0,1,0,1,1,1,1,0,0,1,0]
clf_dummy = clf_dummy.fit(ground_truth, p)
score(clf_dummy, ground_truth, p)

Run Code Online (Sandbox Code Playgroud)

当我运行这些命令时,我p打印为:

[0 0 0 0 0 0 0 0 0 0 0 0 0]
1.0

Run Code Online (Sandbox Code Playgroud)

p当我输入时,为什么我会更改为一系列零p = [0,0,0,1,0,1,1,1,1,0,0,1,0]

Answer 1

ker*_*mat 22

正如其他答案中提到的，特异性是对负类的回忆。只需设置参数即可达到pos_label：

from sklearn.metrics import recall_score
y_true = [0, 1, 0, 0, 1, 0]
y_pred = [0, 0, 1, 1, 1, 1]
recall_score(y_true, y_pred, pos_label=0)

Run Code Online (Sandbox Code Playgroud)

返回.25.

Answer 2

sed*_*deh 17

你可以得到specificity从confusion matrix.对于二进制分类问题,它将类似于:

from sklearn.metrics import confusion_matrix
y_true = [0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 1, 0, 1, 0, 1, 0, 1]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
specificity = tn / (tn+fp)

Run Code Online (Sandbox Code Playgroud)

Answer 3

Ibr*_*iev 6

首先你需要知道：

DummyClassifier(strategy='most_frequent'...

Run Code Online (Sandbox Code Playgroud)

会给你分类器，它从你的训练集中返回最频繁的标签。它甚至不考虑 X 中的样本。您可以在此行中传递任何内容而不是 ground_truth：

clf_dummy = clf_dummy.fit(ground_truth, p)

Run Code Online (Sandbox Code Playgroud)

训练结果和预测将保持不变，因为 p 中的大多数标签都是标签“0”。

您需要知道的第二件事：make_scorer 返回带有接口的scorer(estimator, X, y)函数该函数将调用集合 X 上估计器的 predict 方法，并计算预测标签和 y 之间的特异性函数。

所以它在任何数据集上调用 clf_dummy（不管是哪一个，它总是返回 0），并返回 0 的向量，然后它计算 ground_truth 和预测之间的特异性损失。您的预测为 0，因为 0 是训练集中的多数类。您的分数等于 1，因为没有误报预测。

我更正了您的代码，以增加更多便利。

from sklearn.dummy import DummyClassifier clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0) X = [[0],[0],[1],[0],[1],[1],[1],[0],[0],[1],[0],[0],[1]] p = [0,0,0,1,0,1,1,1,1,0,0,1,0] clf_dummy = clf_dummy.fit(X, p) score(clf_dummy, X, p)
Run Code Online (Sandbox Code Playgroud)

Answer 4

jtr*_*ans 5

As I understand it, 'specificity' is just a special case of 'recall'. Recall is calculated for the actual positive class ( TP / [TP+FN] ), whereas 'specificity' is the same type of calculation but for the actual negative class ( TN / [TN+FP] ).

It really only makes sense to have such specific terminology for binary classification problems. For a multi-class classification problem it would be more convenient to talk about recall with respect to each class. There is no reason why you can't talk about recall in this way even when dealing with binary classification problem (e.g. recall for class 0, recall for class 1).

For example, recall tells us the proportion of patients that actual have cancer, being successfully diagnosed as having cancer. However, to generalize, you could say Class X recall tells us the proportion of samples actually belonging to Class X, being successfully predicted as belonging to Class X.

Given this, you can use from sklearn.metrics import classification_report to produce a dictionary of the precision, recall, f1-score and support for each label/class. You can also rely on from sklearn.metrics import precision_recall_fscore_support as well, depending on your preference. Documentation here.

from sklearn.metrics import precision_recall_fscore_support

labels = ['dog', 'cat', 'pig']

y_true = np.array(['cat', 'dog', 'pig', 'cat', 'dog', 'pig'])
y_pred = np.array(['cat', 'pig', 'dog', 'cat', 'cat', 'dog'])

prfs = precision_recall_fscore_support(y_true, y_pred, average=None, labels=labels)
precisions = prfs[0]
recalls = prfs[1] #Specificity in Binary Classification
fbeta_scores = prfs[2]
supports = prfs[3]

print(recalls) # Note the order of this array is dependent on the order of your labels array

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，1 月前
查看次数：	9647 次
最近记录：	6 年，5 月前