Fitting For Discrete Data: Negative Binomial, Poisson, Geometric Distribution

Question

Fitting For Discrete Data: Negative Binomial, Poisson, Geometric Distribution

Sal*_*lih 5 python statistics distribution scipy

In scipy there is no support for fitting discrete distributions using data. I know there are a lot of subject about this.

For example if i have an array like below:

x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]

I couldn't apply for this array:

from scipy.stats import nbinom
param = nbinom.fit(x)

Run Code Online (Sandbox Code Playgroud)

But i would like to ask you up to date, is there any way to fit for these three discrete distributions and then choose the best fit for the discrete dataset?

Answer 1

Mar*_*rat 8

您可以使用矩量法来拟合任何特定的分布。

基本思想：得到经验一阶矩、二阶矩等，然后从这些矩导出分布参数。

因此，在所有这些情况下，我们只需要两个时刻。让我们得到它们：

import pandas as pd
# for other distributions, you'll need to implement PMF
from scipy.stats import nbinom, poisson, geom

x = pd.Series(x)
mean = x.mean()
var = x.var()
likelihoods = {}  # we'll use it later

Run Code Online (Sandbox Code Playgroud)

注意：我使用 pandas 而不是 numpy。这是因为 numpyvar()和std()不应用贝塞尔校正，而 pandas 则应用。如果您有 100 多个样本，应该不会有太大差异，但对于较小的样本，这可能很重要。

现在，让我们获取这些分布的参数。负二项式有两个参数：p、r。让我们估计它们并计算数据集的可能性：

# From the wikipedia page, we have:
# mean = pr / (1-p)
# var = pr / (1-p)**2
# without wiki, you could use MGF to get moments; too long to explain here
# Solving for p and r, we get:

p = 1 - mean / var  # TODO: check for zero variance and limit p by [0, 1]
r = (1-p) * mean / p

Run Code Online (Sandbox Code Playgroud)

UPD：维基百科和 scipy 使用不同的 p 定义，一种将其视为成功概率，另一种将其视为失败概率。因此，为了与 scipy 概念保持一致，请使用：

p = mean / var
r = p * mean / (1-p)

Run Code Online (Sandbox Code Playgroud)

UPD 结束

更新2：

我建议改用@thilak 的代码日志可能性。它可以避免精度损失，这对于大样本尤其重要。

UPD2 结束

计算可能性：

likelihoods['nbinom'] = x.map(lambda val: nbinom.pmf(val, r, p)).prod()

Run Code Online (Sandbox Code Playgroud)

与Poisson相同，只有一个参数：

# from Wikipedia,
# mean = variance = lambda. Nothing to solve here
lambda_ = mean
likelihoods['poisson'] = x.map(lambda val: poisson.pmf(val, lambda_)).prod()

Run Code Online (Sandbox Code Playgroud)

几何分布相同：

# mean = 1 / p  # this form fits the scipy definition
p = 1 / mean

likelihoods['geometric'] = x.map(lambda val: geom.pmf(val, p)).prod()

Run Code Online (Sandbox Code Playgroud)

最后，让我们获得最佳拟合：

best_fit = max(likelihoods, key=lambda x: likelihoods[x])
print("Best fit:", best_fit)
print("Likelihood:", likelihoods[best_fit])

Run Code Online (Sandbox Code Playgroud)

如果您有任何疑问，请告诉我

归档时间：	6 年，2 月前
查看次数：	5119 次
最近记录：	2 年，4 月前