如何从scikit-learn中与Forecast_proba一起使用的cross_val_predict获取类标签

Question

如何从scikit-learn中与Forecast_proba一起使用的cross_val_predict获取类标签

gc5*_*gc5 5 python scikit-learn cross-validation

我需要使用3倍交叉验证来训练随机森林分类器。对于每个样本，当碰巧出现在测试集中时，我需要检索其预测概率。

我正在使用scikit-learn版本0.18.dev0。

这个新版本增加了使用cross_val_predict（）方法和一个附加参数method来定义估计器需要哪种预测的功能。

就我而言，我想在多类方案中使用predict_proba（）方法，该方法返回每个类的概率。

但是，当我运行该方法时，我得到的结果是预测概率矩阵，其中每一行代表一个样本，每一列代表特定类别的预测概率。

问题在于该方法没有指出哪个类对应于每一列。

我需要的值RandomForestClassifier与属性classes_中定义的返回值相同（以我为例）：

classes_：形状= [n_classes]的数组或此类数组的列表类标签（单输出问题），或类标签的数组列表（多输出问题）。

之所以需要这样做，是predict_proba()因为在其文档中写道：

类的顺序与属性classes_中的顺序相对应。

最小的示例如下：

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict

clf = RandomForestClassifier()

X = np.random.randn(10, 10)
y = y = np.array([1] * 4 + [0] * 3 + [2] * 3)

# how to get classes from here?
proba = cross_val_predict(estimator=clf, X=X, y=y, method="predict_proba")

# using the classifier without cross-validation
# it is possible to get the classes in this way:
clf.fit(X, y)
proba = clf.predict_proba(X)
classes = clf.classes_

Run Code Online (Sandbox Code Playgroud)

Answer 1

max*_*moo 4

是的，它们将按排序顺序排列；这是因为DecisionTreeClassifier（这是的默认base_estimator值RandomForestClassifier）用于np.unique构造classes_返回输入数组的排序唯一值的属性。

是的，我认为应该这样做 (2认同)

归档时间：	9 年，3 月前
查看次数：	883 次
最近记录：	9 年，3 月前