python中的快速向量化多项式

alg*_*gol 6 python optimization numpy vectorization multinomial

我目前正在使用 NumPy 执行以下任务:我有一个很大的值网格,我需要在每个点提取多项式样本。多项式的概率向量因网格站点而异,因此 NumPy 多项式函数对我来说不太适用,因为它从相同的分布中进行所有绘制。迭代每个站点似乎效率极低,我想知道是否有一种方法可以在 NumPy 中有效地做到这一点。如果我使用 Theano (参见这个答案),这样的事情似乎是可能的(而且很快),但这需要大量的重写,我希望避免这种情况。多项式采样能否在基本 NumPy 中有效矢量化?

编辑:很容易修改 Warren 的代码以允许不同站点的不同计数,因为我发现我需要:所有需要做的就是传入完整数组count并删除第一个,如下所示:

import numpy as np


def multinomial_rvs(count, p):
    """
    Sample from the multinomial distribution with multiple p vectors.

    * count must be an (n-1)-dimensional numpy array.
    * p must an n-dimensional numpy array, n >= 1.  The last axis of p
      holds the sequence of probabilities for a multinomial distribution.

    The return value has the same shape as p.
    """
    out = np.zeros(p.shape, dtype=int)
    ps = p.cumsum(axis=-1)
    # Conditional probabilities
    with np.errstate(divide='ignore', invalid='ignore'):
        condp = p / ps
    condp[np.isnan(condp)] = 0.0
    for i in range(p.shape[-1]-1, 0, -1):
        binsample = np.random.binomial(count, condp[..., i])
        out[..., i] = binsample
        count -= binsample
    out[..., 0] = count
    return out
Run Code Online (Sandbox Code Playgroud)

War*_*ser 2

这是您可以做到的一种方法。它没有完全矢量化,但 Python 循环遍历了这些p值。如果向量的长度p不太大,这对您来说可能足够快。

多项式分布是通过重复调用 来实现的np.random.binomial,它实现了其参数的广播。

import numpy as np


def multinomial_rvs(n, p):
    """
    Sample from the multinomial distribution with multiple p vectors.

    * n must be a scalar.
    * p must an n-dimensional numpy array, n >= 1.  The last axis of p
      holds the sequence of probabilities for a multinomial distribution.

    The return value has the same shape as p.
    """
    count = np.full(p.shape[:-1], n)
    out = np.zeros(p.shape, dtype=int)
    ps = p.cumsum(axis=-1)
    # Conditional probabilities
    with np.errstate(divide='ignore', invalid='ignore'):
        condp = p / ps
    condp[np.isnan(condp)] = 0.0
    for i in range(p.shape[-1]-1, 0, -1):
        binsample = np.random.binomial(count, condp[..., i])
        out[..., i] = binsample
        count -= binsample
    out[..., 0] = count
    return out
Run Code Online (Sandbox Code Playgroud)

下面是一个示例,其中“网格”的形状为 (2, 3),多项式分布是四维的(即每个p向量的长度为 4)。

In [182]: p = np.array([[[0.25, 0.25, 0.25, 0.25], 
     ...:                [0.01, 0.02, 0.03, 0.94], 
     ...:                [0.75, 0.15, 0.05, 0.05]], 
     ...:               [[0.01, 0.99, 0.00, 0.00], 
     ...:                [1.00, 0.00, 0.00, 0.00], 
     ...:                [0.00, 0.25, 0.25, 0.50]]])                                                                                                 

In [183]: sample = multinomial_rvs(1000, p)                                     

In [184]: sample                                                                
Out[184]: 
array([[[ 249,  260,  233,  258],
        [   3,   21,   33,  943],
        [ 766,  131,   55,   48]],

       [[   5,  995,    0,    0],
        [1000,    0,    0,    0],
        [   0,  273,  243,  484]]])

In [185]: sample.sum(axis=-1)                                                   
Out[185]: 
array([[1000, 1000, 1000],
       [1000, 1000, 1000]])
Run Code Online (Sandbox Code Playgroud)

在评论中,您说“p向量的形式为:p = [p_s, (1-p_s)/4, (1-p_s)/4, (1-p_s)/4, (1-p_s)/ 4],p_s 因站点而异。” 给定一个包含值的数组,以下是如何使用上述函数p_s

首先为示例创建一些数据:

In [73]: p_s = np.random.beta(4, 2, size=(2, 3))                                                                                                        

In [74]: p_s                                                                                                                                            
Out[74]: 
array([[0.61662208, 0.6072323 , 0.62208711],
       [0.86848938, 0.58959038, 0.47565799]])
Run Code Online (Sandbox Code Playgroud)

p根据以下公式创建包含多项式概率的数组p = [p_s, (1-p_s)/4, (1-p_s)/4, (1-p_s)/4, (1-p_s)/4]

In [75]: p = np.expand_dims(p_s, -1) * np.array([1, -0.25, -0.25, -0.25, -0.25]) + np.array([0, 0.25, 0.25, 0.25, 0.25])                                

In [76]: p                                                                                                                                              
Out[76]: 
array([[[0.61662208, 0.09584448, 0.09584448, 0.09584448, 0.09584448],
        [0.6072323 , 0.09819192, 0.09819192, 0.09819192, 0.09819192],
        [0.62208711, 0.09447822, 0.09447822, 0.09447822, 0.09447822]],

       [[0.86848938, 0.03287765, 0.03287765, 0.03287765, 0.03287765],
        [0.58959038, 0.1026024 , 0.1026024 , 0.1026024 , 0.1026024 ],
        [0.47565799, 0.1310855 , 0.1310855 , 0.1310855 , 0.1310855 ]]])
Run Code Online (Sandbox Code Playgroud)

现在执行与之前相同的操作来生成示例(将值 1000 更改为适合您的问题的值):

In [77]: sample = multinomial_rvs(1000, p)                                                                                                              

In [78]: sample                                                                                                                                         
Out[78]: 
array([[[618,  92, 112,  88,  90],
        [597, 104, 103, 101,  95],
        [605, 100,  95,  98, 102]],

       [[863,  32,  43,  27,  35],
        [602, 107, 108,  94,  89],
        [489, 130, 129, 129, 123]]])

In [79]: sample.sum(axis=-1)                                                                                                                            
Out[79]: 
array([[1000, 1000, 1000],
       [1000, 1000, 1000]])
Run Code Online (Sandbox Code Playgroud)