python scipy stats pareto fit：如何工作

Question

python scipy stats pareto fit：如何工作

...帮助和在线文档说，函数scipy.stats.pareto.fit将要拟合的数据集以及变量b（指数），位置，比例作为变量。结果是三元组（指数，位置，比例）

从相同的分布生成数据应导致适合查找用于生成数据的参数，例如（使用python 3 colsole）

$  python
Python 3.3.0 (default, Dec 12 2012, 07:43:02) 
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Run Code Online (Sandbox Code Playgroud)

（在下面的代码行中，省略了python控制台提示符“ >>>”）

dataset=scipy.stats.pareto.rvs(1.5,size=10000)  #generating data
scipy.stats.pareto.fit(dataset)

Run Code Online (Sandbox Code Playgroud)

但是这导致

(1.0, nan, 0.0)

Run Code Online (Sandbox Code Playgroud)

（指数1，应为1.5）和

dataset=scipy.stats.pareto.rvs(1.1,size=10000)  #generating data
scipy.stats.pareto.fit(dataset)

Run Code Online (Sandbox Code Playgroud)

结果是

(1.0, nan, 0.0)

Run Code Online (Sandbox Code Playgroud)

（指数1，应为1.1）和

dataset=scipy.stats.pareto.rvs(4,loc=2.0,scale=0.4,size=10000)    #generating data
scipy.stats.pareto.fit(dataset)

Run Code Online (Sandbox Code Playgroud)

（指数应为4，位置应为2，比例应为0.4）

(1.0, nan, 0.0)

Run Code Online (Sandbox Code Playgroud)

等调用fit函数时给出另一个指数

scipy.stats.pareto.fit(dataset,1.4)

Run Code Online (Sandbox Code Playgroud)

总是精确地返回此指数

(1.3999999999999999, nan, 0.0)

Run Code Online (Sandbox Code Playgroud)

显而易见的问题是：我是否会完全误解此fit函数的用途，它的用法是否有所不同，还是只是被破坏了？

备注：在有人提到像Aaron Clauset网页（http://tuvalu.santafe.edu/~aaronc/powerlaws/）上比scipy.stats方法更可靠之前，应该使用它：可能是对的，但是它们也非常非常非常耗时，并且在普通PC上处理10000点的数据集需要花费很多小时（可能是几天，几周，几年）。

编辑：哦：拟合函数的参数不是分布的指数而是指数减1（但这不会改变上述问题）

Answer 1

unu*_*tbu 5

看起来您必须为locand提供一个猜测scale：

In [78]: import scipy.stats as stats

In [79]: b, loc, scale = 1.5, 0, 1

In [80]: data = stats.pareto.rvs(b, size=10000)

In [81]: stats.pareto.fit(data, 1, loc=0, scale=1)
Out[81]: (1.5237427002368424, -2.8457847787917788e-05, 1.0000329980475393)

Run Code Online (Sandbox Code Playgroud)

并且猜测必须非常准确才能成功：

In [82]: stats.pareto.fit(data, 1, loc=0, scale=1.01)
Out[82]: (1.5254113096223709, -0.0015898489208676779, 1.0015943893384001)

In [83]: stats.pareto.fit(data, 1, loc=0, scale=1.05)
Out[83]: (1.5234726749064218, 0.00025804526532994751, 0.99974649559141171)

In [84]: stats.pareto.fit(data, 1, loc=0.05, scale=1.05)
Out[84]: (1.0, 0.050000000000000003, 1.05)

Run Code Online (Sandbox Code Playgroud)

希望问题的上下文会告诉你什么是合适的猜测loc，scale应该是什么。最有可能，loc=0并且scale=1。

Answer 2

Tra*_*ant 5

fit方法是一种非常通用且简单的方法，它对分布的非负似然函数（self.nnlf）进行了优化。在像pareto这样的具有可创建未定义区域的参数的发行版中，常规方法不起作用。

特别是，当随机变量的值不适合分布的有效范围时，常规nnlf方法将返回“ inf”。除非您猜测起始值与最终拟合非常接近，否则“ fmin”优化器无法很好地与该目标函数配合使用。

通常，对于pdf适用范围有限的发行版，.fit方法需要使用约束优化器。

归档时间：	12 年，7 月前
查看次数：	4330 次
最近记录：	11 年，4 月前