SGDClassifier与predict_proba

Question

SGDClassifier与predict_proba

rat*_*hna 1 python machine-learning scikit-learn

我正在使用sklearn库来训练和测试我的数据。

targetDataCsv =  pd.read_csv("target.csv","rt"))
testNormalizedCsv = csv.reader(open("testdf_new.csv","rt",encoding="utf-8"))
traningNormalizedCsv = pd.read_csv("traindf_new.csv", skiprows=1,nrows=99999)
df = pd.read_csv("testdf_new.csv", skiprows=1, nrows=9999)

Run Code Online (Sandbox Code Playgroud)

我想使用SGDClassifier的partial_fit方法，因为我的训练数据有超过200000行。

 X = traningNormalizedCsv.values
 y = targetDataCsv.values   
 clf = SGDClassifier()
 clf.partial_fit(X, y)

Run Code Online (Sandbox Code Playgroud)

但是该分类器没有predict_proba方法来获取我的测试数据的目标概率。

   clf.predict_proba(df.values)

Run Code Online (Sandbox Code Playgroud)

请提出建议。

Answer 1

Ant*_*eev 6

正如您在doc中看到的- 此方法仅适用于日志丢失和修改的Huber丢失。

因此，您必须更改损失函数。

from sklearn.linear_model import SGDClassifier
import numpy as np
X = np.random.random_sample((1000,3))
y = np.random.binomial(3, 0.5, 1000)
model = SGDClassifier(loss="modified_huber")
model.partial_fit(X, y, classes=np.unique(y))
print(model.predict_proba([[0.5,0.6,0.7]]))

Run Code Online (Sandbox Code Playgroud)

输出例如：[[0. 0. 1. 0.]]

归档时间：	8 年，1 月前
查看次数：	2141 次
最近记录：	8 年，1 月前