GaussianNB: - ValueError:先验的总和应为1

Question

GaussianNB: - ValueError:先验的总和应为1

Muk*_*pta 4 machine-learning gaussian python-2.7 scikit-learn

我想做什么？

我正在尝试使用GaussianNB分类器训练具有10个标签的数据集,但在调整我的gaussianNB先前参数时,我收到此错误: -

文件"/home/mg/anaconda2/lib/python2.7/site-packages/sklearn/naive_bayes.py",第367行,在_partial_fit中引发ValueError('先验的总和应为1.')ValueError:总和前辈应该是1.

代码: - clf = GaussianNB(priors = [0.08,0.14,0.03,0.16,0.11,0.16,0.07,0.14,0.11,0.0])

您可以看到总和显然是1,但它显示我这个错误,你能指出错误.

Answer 1

sas*_*cha 6

这在sklearn中看起来是一个非常糟糕的设计决策,因为他们通常不会比较浮点数(每个计算机科学家应该知道的关于浮点运算的东西),这让我感到惊讶(因为sklearn通常很高 -质量代码)!

(尽管使用了列表,但我没有看到任何错误的用法.文档调用了一个数组,而不是像许多其他情况一样的数组,但他们的代码仍在进行数组转换)

他们的代码:

if self.priors is not None:
    priors = np.asarray(self.priors)
    # Check that the provide prior match the number of classes
    if len(priors) != n_classes:
        raise ValueError('Number of priors must match number of'
                         ' classes.')
    # Check that the sum is 1
    if priors.sum() != 1.0:
        raise ValueError('The sum of the priors should be 1.')
    # Check that the prior are non-negative
    if (priors < 0).any():
        raise ValueError('Priors must be non-negative.')
    self.class_prior_ = priors
else:
    # Initialize the priors to zeros for each class
    self.class_prior_ = np.zeros(len(self.classes_),
                                 dtype=np.float64)

Run Code Online (Sandbox Code Playgroud)

所以:

你给出一个列表,但是他们的代码将创建一个numpy-array
因此np.sum()将用于求和
在您的情况下, 可能存在fp-math相关的数值误差
- 你的总和在技术上是!= 1.0; 但非常接近它!
fp-比较x == 1.0被认为是坏的!
- numpy带来了np.isclose()这种做法的常用方法

演示:

import numpy as np

priors = np.array([0.08, 0.14, 0.03, 0.16, 0.11, 0.16, 0.07, 0.14, 0.11, 0.0])
my_sum = np.sum(priors)
print('my_sum: ', my_sum)
print('naive: ', my_sum == 1.0)
print('safe: ', np.isclose(my_sum, 1.0))

Run Code Online (Sandbox Code Playgroud)

输出:

('my_sum: ', 1.0000000000000002)
('naive: ', False)
('safe: ', True)

Run Code Online (Sandbox Code Playgroud)

编辑:

由于我认为此代码不好,我在这里发布了一个问题,您可以关注它以确定它们是否符合要求.

numpy.random.sample() ,这也需要这样的载体,其实就是做一个FP-安全的方法太(数值上更稳定总和+小量检查,但没有使用np.isclose())所看到这里.

归档时间：	8 年，9 月前
查看次数：	752 次
最近记录：	7 年，10 月前