ValueError:数据不是二进制文件,并且没有为roc_curve指定pos_label

Question

ValueError:数据不是二进制文件,并且没有为roc_curve指定pos_label

Moh*_*din 2 python machine-learning scikit-learn

我正在尝试计算roc_curve,但是我收到了此错误消息

Traceback (most recent call last):
  File "script.py", line 94, in <module>
    fpr, tpr, _ = roc_curve(y_validate, status[:,1])
  File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/ranking.py", line 501, in roc_curve
    y_true, y_score, pos_label=pos_label, sample_weight=sample_weight)
  File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/ranking.py", line 308, in _binary_clf_curve
    raise ValueError("Data is not binary and pos_label is not specified")
ValueError: Data is not binary and pos_label is not specified

Run Code Online (Sandbox Code Playgroud)

我的代码

status = rf.predict_proba(x_validate)
fpr, tpr, _ = roc_curve(y_validate, status[:,1]) //error generated here
roc_auc = auc(fpr, tpr)
print roc_auc

Run Code Online (Sandbox Code Playgroud)

PS:不太了解这个解决方案(ValueError:数据不是二进制,而且没有指定pos_label),因为它似乎并不真正相关.

Answer 1

mpr*_*rat 7

要使ROC曲线的计算有效,您必须将要处理的标签指定为"true"或"positive"标签.Scikit-learn假定给它的数据总是有标签0和1(在你的情况下在变量中y_validate),其中一个被任意选择作为正面标签(我不知道如何 - 我相信你可以挖掘源代码并弄清楚).

如您的错误消息中所指定 - 您的数据没有此预期的二进制格式.即使您的数据是二进制,但标签是'T'和'F',它也会抛出此错误.因此,根据roc_curve()scikit-learn函数的文档,您需要准确指定要用作"正类"的字符串标签.因此,如果你的y_validate变量中的标签是'T'和'F' ,你会这样做:fpr, tpr, _ = roc_curve(y_validate, status[:,1], pos_label='T').

归档时间：	9 年，8 月前
查看次数：	7854 次
最近记录：	6 年，11 月前