Sal*_*lih 5 python statistics distribution scipy
In scipy there is no support for fitting discrete distributions using data. I know there are a lot of subject about this.
For example if i have an array like below:
x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]
I couldn't apply for this array:
from scipy.stats import nbinom
param = nbinom.fit(x)
Run Code Online (Sandbox Code Playgroud)
But i would like to ask you up to date, is there any way to fit for these three discrete distributions and then choose the best fit for the discrete dataset?
您可以使用矩量法来拟合任何特定的分布。
基本思想:得到经验一阶矩、二阶矩等,然后从这些矩导出分布参数。
因此,在所有这些情况下,我们只需要两个时刻。让我们得到它们:
import pandas as pd
# for other distributions, you'll need to implement PMF
from scipy.stats import nbinom, poisson, geom
x = pd.Series(x)
mean = x.mean()
var = x.var()
likelihoods = {} # we'll use it later
Run Code Online (Sandbox Code Playgroud)
注意:我使用 pandas 而不是 numpy。这是因为 numpyvar()和std()不应用贝塞尔校正,而 pandas 则应用。如果您有 100 多个样本,应该不会有太大差异,但对于较小的样本,这可能很重要。
现在,让我们获取这些分布的参数。负二项式有两个参数:p、r。让我们估计它们并计算数据集的可能性:
# From the wikipedia page, we have:
# mean = pr / (1-p)
# var = pr / (1-p)**2
# without wiki, you could use MGF to get moments; too long to explain here
# Solving for p and r, we get:
p = 1 - mean / var # TODO: check for zero variance and limit p by [0, 1]
r = (1-p) * mean / p
Run Code Online (Sandbox Code Playgroud)
UPD:维基百科和 scipy 使用不同的 p 定义,一种将其视为成功概率,另一种将其视为失败概率。因此,为了与 scipy 概念保持一致,请使用:
p = mean / var
r = p * mean / (1-p)
Run Code Online (Sandbox Code Playgroud)
UPD 结束
更新2:
我建议改用@thilak 的代码日志可能性。它可以避免精度损失,这对于大样本尤其重要。
UPD2 结束
计算可能性:
likelihoods['nbinom'] = x.map(lambda val: nbinom.pmf(val, r, p)).prod()
Run Code Online (Sandbox Code Playgroud)
与Poisson相同,只有一个参数:
# from Wikipedia,
# mean = variance = lambda. Nothing to solve here
lambda_ = mean
likelihoods['poisson'] = x.map(lambda val: poisson.pmf(val, lambda_)).prod()
Run Code Online (Sandbox Code Playgroud)
几何分布相同:
# mean = 1 / p # this form fits the scipy definition
p = 1 / mean
likelihoods['geometric'] = x.map(lambda val: geom.pmf(val, p)).prod()
Run Code Online (Sandbox Code Playgroud)
最后,让我们获得最佳拟合:
best_fit = max(likelihoods, key=lambda x: likelihoods[x])
print("Best fit:", best_fit)
print("Likelihood:", likelihoods[best_fit])
Run Code Online (Sandbox Code Playgroud)
如果您有任何疑问,请告诉我