Tri*_*ick 4 floating-point numerical-methods
所以我在下面的代码中修复了一个有趣的错误,但我不确定我最好的方法:
p = 1
probabilities = [ ... ] # a (possibly) long list of numbers between 0 and 1
for wp in probabilities:
if (wp > 0):
p *= wp
# Take the natural log, this crashes when 'probabilites' is long enough that p ends up
# being zero
try:
result = math.log(p)
Run Code Online (Sandbox Code Playgroud)
因为结果不需要精确,我通过简单地保持最小的非零值来解决这个问题,并且如果p变为0则使用它.
p = 1
probabilities = [ ... ] # a long list of numbers between 0 and 1
for wp in probabilities:
if (wp > 0):
old_p = p
p *= wp
if p == 0:
# we've gotten so small, its just 0, so go back to the smallest
# non-zero we had
p = old_p
break
# Take the natural log, this crashes when 'probabilites' is long enough that p ends up
# being zero
try:
result = math.log(p)
Run Code Online (Sandbox Code Playgroud)
这样可行,但对我来说似乎有点麻烦.我不做大量的这种数值编程,我不确定这是否是人们使用的那种修复,或者如果有更好的东西我可以去做.
因为,math.log(a * b)等于math.log(a) + math.log(b),为什么不取一个probabilities数组所有成员的日志的总和?
这样可以避免在p流量不足时出现这么小的问题.
编辑:这是numpy版本,对于大型数据集来说更干净,速度更快:
import numpy
prob = numpy.array([0.1, 0.213, 0.001, 0.98 ... ])
result = sum(numpy.log(prob))
Run Code Online (Sandbox Code Playgroud)