我想知道是否有人可以快速查看以下代码片段并指出我在计算模型中每个类的样本概率和相关代码错误时的误解。我尝试手动计算 sklearn 函数 lm.predict_proba(X) 提供的结果,遗憾的是结果不同,所以我犯了一个错误。
我认为该错误将在以下代码演练的“d”部分。也许在数学上,但我不明白为什么。
a) 创建和训练逻辑回归模型(工作正常)
lm = LogisticRegression(random_state=413, multi_class='multinomial', solver='newton-cg')
lm.fit(X, train_labels)
Run Code Online (Sandbox Code Playgroud)
b)保存系数和偏差(工作正常)
W = lm.coef_
b = lm.intercept_
Run Code Online (Sandbox Code Playgroud)
c) 使用 lm.predict_proba(X)(工作正常)
def reshape_single_element(x,num):
singleElement = x[num]
nx,ny = singleElement.shape
return singleElement.reshape((1,nx*ny))
select_image_number = 6
X_select_image_data=reshape_single_element(train_dataset,select_image_number)
Y_probabilities = lm.predict_proba(X_select_image_data)
Y_pandas_probabilities = pd.Series(Y_probabilities[0], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print"estimate probabilities for each class: \n" ,Y_pandas_probabilities , "\n"
print"all probabilities by lm.predict_proba(..) sum up to ", np.sum(Y_probabilities) , "\n"
Run Code Online (Sandbox Code Playgroud)
输出是:
estimate probabilities for each …Run Code Online (Sandbox Code Playgroud)