Vik*_*ngh 4 python metrics machine-learning scikit-learn cross-entropy
我正在阅读log_loss和交叉熵,似乎有两种计算方法
第一:
import numpy as np
from sklearn.metrics import log_loss
def cross_entropy(predictions, targets):
N = predictions.shape[0]
ce = -np.sum(targets*np.log(predictions))/N
return ce
predictions = np.array([[0.25,0.25,0.25,0.25],
[0.01,0.01,0.01,0.97]])
targets = np.array([[1,0,0,0],
[0,0,0,1]])
x = cross_entropy(predictions, targets)
print(log_loss(targets, predictions), 'our_answer:', ans)
Run Code Online (Sandbox Code Playgroud)
输出:0.7083767843022996 our_answer: 0.71355817782,几乎相同。所以这不是问题。
资料来源:http : //wiki.fast.ai/index.php/Log_Loss
以上实现是等式的中间部分。
第二:计算方法是等式的RHS部分:
res = 0
for act_row, pred_row in zip(targets, np.array(predictions)):
for class_act, class_pred in zip(act_row, pred_row):
res += - class_act * np.log(class_pred) - (1-class_act) * np.log(1-class_pred)
print(res/len(targets))
Run Code Online (Sandbox Code Playgroud)
输出: 1.1549753967602232
不太一样。我用numpy尝试过相同的实现也没有用。我究竟做错了什么?
PS:我也很好奇,-y log (y_hat)在我看来,这和- sigma(p_i * log( q_i))为什么有-(1-y) log(1-y_hat)一部分一样。显然,我误会了如何-y log (y_hat)计算。
信用:从以下代码借来的:交叉熵函数(python)
无法重现您在第一部分报告的结果中的差异(您还引用了一个ans变量,您似乎没有定义它,我想是x):
import numpy as np
from sklearn.metrics import log_loss
def cross_entropy(predictions, targets):
N = predictions.shape[0]
ce = -np.sum(targets*np.log(predictions))/N
return ce
predictions = np.array([[0.25,0.25,0.25,0.25],
[0.01,0.01,0.01,0.97]])
targets = np.array([[1,0,0,0],
[0,0,0,1]])
Run Code Online (Sandbox Code Playgroud)
结果:
cross_entropy(predictions, targets)
# 0.7083767843022996
log_loss(targets, predictions)
# 0.7083767843022996
log_loss(targets, predictions) == cross_entropy(predictions, targets)
# True
Run Code Online (Sandbox Code Playgroud)
您的cross_entropy功能似乎工作正常。
关于第二部分:
显然,我误会了如何
-y log (y_hat)计算。
确实,更仔细地阅读您链接的fast.ai Wiki后,您会看到该方程的RHS仅适用于二进制分类(其中始终为1,y并且1-y将为零),在这里情况并非如此-您拥有4类多项式分类。因此,正确的公式是
res = 0
for act_row, pred_row in zip(targets, np.array(predictions)):
for class_act, class_pred in zip(act_row, pred_row):
res += - class_act * np.log(class_pred)
Run Code Online (Sandbox Code Playgroud)
即丢弃的减法(1-class_act) * np.log(1-class_pred)。
结果:
res/len(targets)
# 0.7083767843022996
res/len(targets) == log_loss(targets, predictions)
# True
Run Code Online (Sandbox Code Playgroud)
在更一般的水平上(对数丢失和二进制分类的准确性的机制),您可能会发现此答案很有用。
| 归档时间: |
|
| 查看次数: |
6683 次 |
| 最近记录: |