在 scikit 的 precision_recall_curve 中，为什么阈值与召回率和精度有不同的维度？

Question

在 scikit 的 precision_recall_curve 中，为什么阈值与召回率和精度有不同的维度？

sap*_*ico 6 python python-2.7 scikit-learn precision-recall

我想看看精确度和召回率如何随阈值变化（不仅仅是彼此之间）

model = RandomForestClassifier(500, n_jobs = -1);  
model.fit(X_train, y_train);  
probas = model.predict_proba(X_test)[:, 1]  
precision, recall, thresholds = precision_recall_curve(y_test, probas)  
print len(precision)   
print len(thresholds)

Run Code Online (Sandbox Code Playgroud)

返回：

283  
282

Run Code Online (Sandbox Code Playgroud)

因此，我不能将它们一起绘制。关于为什么会这样的任何线索？

Answer 1

小智 8

对于这个问题，应该忽略last precision和recall值。last precision和recall值总是分别为1.和0，并且没有对应的阈值。

例如这里是一个解决方案：

def plot_precision_recall_vs_threshold(precisions, recall, thresholds): 
    fig = plt.figure(figsize= (8,5))
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
    plt.plot(thresholds, recall[:-1], "g-", label="Recall")
    plt.legend()

plot_precision_recall_vs_threshold(precision, recall, thresholds)

Run Code Online (Sandbox Code Playgroud)

这些值应该在那里，以便在绘制精度与召回率时绘图从 y 轴 (x=0) 开始。

归档时间：	10 年，5 月前
查看次数：	1699 次
最近记录：	6 年，4 月前