了解 ROC 曲线

Question

了解 ROC 曲线

blu*_*sky 0 false-positive roc scikit-learn

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc , roc_auc_score
import numpy as np

correct_classification = np.array([0,1])
predicted_classification = np.array([1,1])

false_positive_rate, true_positive_rate, tresholds = roc_curve(correct_classification, predicted_classification)

print(false_positive_rate)
print(true_positive_rate)

Run Code Online (Sandbox Code Playgroud)

来自https://en.wikipedia.org/wiki/Sensitivity_and_specificity：

True positive: Sick people correctly identified as sick 
False positive: Healthy people incorrectly identified as sick 
True negative: Healthy people correctly identified as healthy 
False negative: Sick people incorrectly identified as healthy

Run Code Online (Sandbox Code Playgroud)

我使用这些值 0：生病，1：健康

来自https://en.wikipedia.org/wiki/False_positive_rate：

闪阳性率 = 假阳性 / (假阳性 + 真阴性)

误报数量 : 0 真阴性数量 : 1

因此误报率 = 0 / 0 + 1 = 0

读取 roc_curve 的返回值（http://scikit-learn.org/stable/modules/ generated/sklearn.metrics.roc_curve.html# sklearn.metrics.roc_curve）：

fpr ：数组，形状 = [>2]

增加误报率，使得元素 i 是分数 >= 阈值[i] 的预测的误报率。

tpr ：数组，形状 = [>2]

增加真阳性率，使元素 i 成为分数 >= 阈值[i] 的预测的真阳性率。

阈值：数组，形状= [n_thresholds]

降低用于计算 fpr 和 tpr 的决策函数的阈值。Thresholds[0] 表示没有实例被预测，并且任意设置为 max(y_score) + 1。

这与我手动计算的误报率有何不同？阈值是如何设定的？这里提供了一些关于阈值的模式信息： https: //datascience.stackexchange.com/questions/806/advantages-of-auc-vs-standard-accuracy，但我很困惑它如何适合这个实现？

Answer 1

Ale*_*xis 5

在上面的演示中，阈值是橙色条。类 00 的分布为红色（分类器的输出），类 1 的分布为蓝色（同样，分类器输出的概率分布）。它适用于属于一类或另一类的概率：如果一个样本的输出为 [0.34,0.66]，则第 1 类的阈值 0.25 会使其属于第 1 类，即使 0.66 的概率更高。

你不是在课堂上研究 ROC 曲线，而是在课堂上研究概率。

我希望它能回答这个问题（抱歉，如果没有，我会根据需要更准确）

归档时间：	7 年，5 月前
查看次数：	1451 次
最近记录：	7 年，5 月前