Scipy rv_continuous incorrectly generating sample from distribution

Question

Scipy rv_continuous incorrectly generating sample from distribution

Pic*_*Man 2 python random statistics scipy

from scipy import stats
import numpy as np 

class your_distribution(stats.rv_continuous):
    def _pdf(self, x):
        p0 = 10.9949
        p1 = 0.394447
        p2 = 12818.4
        p3 = 2.38898

        return ((p1*p3)/(p3*p0+p2*p1))*((p0*np.exp(-1.0*p1*x))+(p2*np.exp(-1.0*p3*x)))

distribution = your_distribution(a=0.15, b=10.1)
sample = distribution.rvs(size=50000)

Run Code Online (Sandbox Code Playgroud)

The code above generates 50000 samples from a normalized pdf in the range 0.15 to 10.1. However, there is a disproportionately large number of samples generated at the upper bound b=10.1. This does not make sense, as seen when the pdf is plotted.

How would I fix this issue?

Answer 1

kaz*_*ase 6

PDF 已针对整个分布范围正确归一化。但是，设置a并b简单地剪切 PDF 而无需任何重新规范化。随着(a=0.15, b=10.1)PDF 不再集成为 1，并且通过 scipy 实现的一个怪癖，剩余的密度显然是在范围的末尾添加的。这导致在上限的大量样本。

我们可以通过绘制 a=0 和 a=0.15 的累积密度函数 (CDF) 来可视化正在发生的事情：

x = np.linspace(0, 15, 1000)

distribution = your_distribution(a=0.0, b=10.1)
plt.plot(x, distribution.cdf(x), label='a=0')

distribution = your_distribution(a=0.15, b=10.1)
plt.plot(x, distribution.cdf(x), label='a=0.15')

plt.legend()

Run Code Online (Sandbox Code Playgroud)

为了摆脱 CDF 中的跳跃和上限范围的虚假样本，我们需要重新归一化 a..b 范围的 PDF。我懒得分析计算出正确的因素，所以让 scipy 来做艰苦的工作：

from scipy import stats
from scipy.integrate import quad
import numpy as np

# I pulled the definition of the PDF out of the class so we can use it to
# compute the scale factor.
def pdf(x):
    p0 = 10.9949
    p1 = 0.394447
    p2 = 12818.4
    p3 = 2.38898

    return ((p1*p3)/(p3*p0+p2*p1))*((p0*np.exp(-1.0*p1*x))+(p2*np.exp(-1.0*p3*x)))    

class your_distribution(stats.rv_continuous):        
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # integrate area of the PDF in range a..b
        self.scale, _ = quad(pdf, self.a, self.b)

    def _pdf(self, x):
        return pdf(x) / self.scale  # scale PDF so that it integrates to 1 in range a..b 

distribution = your_distribution(a=0.15, b=10.1)
sample = distribution.rvs(size=1000)

Run Code Online (Sandbox Code Playgroud)

如果您碰巧知道积分的解析解，则可以使用它代替对的调用quad。

归档时间：	7 年，4 月前
查看次数：	586 次
最近记录：	7 年，4 月前