快速任意分布随机抽样

Gab*_*iel 18 python random performance

random模块(http://docs.python.org/2/library/random.html)具有几个固定的函数来随机抽样.例如,random.gauss将使用给定的均值和西格玛值对具有正态分布的随机点进行采样.

我正在寻找一种方法来提取一些N用我自己的分布的给定区间之间随机样本尽可能快地python.这就是我的意思:

def my_dist(x):
    # Some distribution, assume c1,c2,c3 and c4 are known.
    f = c1*exp(-((x-c2)**c3)/c4)
    return f

# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)
Run Code Online (Sandbox Code Playgroud)

这里ran_func_sample是我后和a, b是从中可以得出样本的限制.有那种东西python吗?

Igo*_*bin 14

您需要使用逆变换采样方法来获取根据您想要的法则分布的随机值.使用此方法,您只需将反转函数 应用于区间[0,1]中具有标准均匀分布的随机数.

在找到反转函数后,您可以根据所需的分布以这种明显的方式分配1000个数字:

[inverted_function(random.random()) for x in range(1000)]
Run Code Online (Sandbox Code Playgroud)

有关逆变换采样的更多信息:

此外,StackOverflow与该主题相关的问题很好:

  • 人们可能希望从中抽取随机样本的许多函数在分析上是不可逆的。 (4认同)

JS0*_*00N 10

这是使用装饰器执行逆变换采样的一种相当好的方法。

import numpy as np
from scipy.interpolate import interp1d

def inverse_sample_decorator(dist):
    
    def wrapper(pnts, x_min=-100, x_max=100, n=1e5, **kwargs):
        
        x = np.linspace(x_min, x_max, int(n))
        cumulative = np.cumsum(dist(x, **kwargs))
        cumulative -= cumulative.min()
        f = interp1d(cumulative/cumulative.max(), x)
        return f(np.random.random(pnts))
    
    return wrapper
Run Code Online (Sandbox Code Playgroud)

在高斯分布上使用此装饰器,例如:

@inverse_sample_decorator
def gauss(x, amp=1.0, mean=0.0, std=0.2):
    return amp*np.exp(-(x-mean)**2/std**2/2.0)
Run Code Online (Sandbox Code Playgroud)

然后,您可以通过调用该函数从分布生成样本点。关键字参数x_minx_max是原始分布的限制,可以作为参数与gauss参数化分布的其他关键字参数一起传递。

samples = gauss(5000, mean=20, std=0.8, x_min=19, x_max=21)
Run Code Online (Sandbox Code Playgroud)

或者,这可以作为一个函数来完成,该函数将分布作为参数(如您原来的问题中所示),

def inverse_sample_function(dist, pnts, x_min=-100, x_max=100, n=1e5, 
                            **kwargs):
        
    x = np.linspace(x_min, x_max, int(n))
    cumulative = np.cumsum(dist(x, **kwargs))
    cumulative -= cumulative.min()
    f = interp1d(cumulative/cumulative.max(), x)
        
    return f(np.random.random(pnts))
Run Code Online (Sandbox Code Playgroud)


Eel*_*orn 9

该代码实现了nd离散概率分布的采样.通过在对象上设置标志,它也可以用作分段常数概率分布,然后可以用来近似任意pdf.好吧,任意pdf,支持紧凑; 如果您有效地想要采样非常长的尾部,则需要对pdf进行非均匀描述.但即使对于通风点扩散功能(我最初创建它)这样的事情,这仍然是有效的.内部的值排序对于获得准确性至关重要; 尾巴中的许多小值应该有很大的贡献,但是它们会以fp精度被淹没而不需要排序.

class Distribution(object):
    """
    draws samples from a one dimensional probability distribution,
    by means of inversion of a discrete inverstion of a cumulative density function

    the pdf can be sorted first to prevent numerical error in the cumulative sum
    this is set as default; for big density functions with high contrast,
    it is absolutely necessary, and for small density functions,
    the overhead is minimal

    a call to this distibution object returns indices into density array
    """
    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
        self.shape          = pdf.shape
        self.pdf            = pdf.ravel()
        self.sort           = sort
        self.interpolation  = interpolation
        self.transform      = transform

        #a pdf can not be negative
        assert(np.all(pdf>=0))

        #sort the pdf by magnitude
        if self.sort:
            self.sortindex = np.argsort(self.pdf, axis=None)
            self.pdf = self.pdf[self.sortindex]
        #construct the cumulative distribution function
        self.cdf = np.cumsum(self.pdf)
    @property
    def ndim(self):
        return len(self.shape)
    @property
    def sum(self):
        """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
        return self.cdf[-1]
    def __call__(self, N):
        """draw """
        #pick numbers which are uniformly random over the cumulative distribution function
        choice = np.random.uniform(high = self.sum, size = N)
        #find the indices corresponding to this point on the CDF
        index = np.searchsorted(self.cdf, choice)
        #if necessary, map the indices back to their original ordering
        if self.sort:
            index = self.sortindex[index]
        #map back to multi-dimensional indexing
        index = np.unravel_index(index, self.shape)
        index = np.vstack(index)
        #is this a discrete or piecewise continuous distribution?
        if self.interpolation:
            index = index + np.random.uniform(size=index.shape)
        return self.transform(index)


if __name__=='__main__':
    shape = 3,3
    pdf = np.ones(shape)
    pdf[1]=0
    dist = Distribution(pdf, transform=lambda i:i-1.5)
    print dist(10)
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()
Run Code Online (Sandbox Code Playgroud)

而作为一个更现实世界的相关例子:

x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:]     #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1)    #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
Run Code Online (Sandbox Code Playgroud)


Abd*_*tir 7

我遇到了类似的情况,但我想从多元分布中采样,因此,我实现了 Metropolis-Hastings 的基本版本(这是一种 MCMC 方法)。

def metropolis_hastings(target_density, size=500000):
    burnin_size = 10000
    size += burnin_size
    x0 = np.array([[0, 0]])
    xt = x0
    samples = []
    for i in range(size):
        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
        accept_prob = (target_density(xt_candidate))/(target_density(xt))
        if np.random.uniform(0, 1) < accept_prob:
            xt = xt_candidate
        samples.append(xt)
    samples = np.array(samples[burnin_size:])
    samples = np.reshape(samples, [samples.shape[0], 2])
    return samples
Run Code Online (Sandbox Code Playgroud)

target_density该函数需要一个接受数据点并计算其概率的函数。

有关详细信息,请查看我的这个详细答案。