使用Numpy有效地对复杂矩阵产品求和

Tgs*_*591 8 python numpy

我有一个矩阵,X我正在计算中间矩阵乘积的加权和.这是一个可重复性最小的例子:

import numpy as np

random_state = np.random.RandomState(1)
n = 5
p = 10

X = random_state.rand(p, n) # 10x5
X_sum = np.zeros((n, n)) # 5x5

# The length of weights are not related to X's dims,
# but will always be smaller
y = 3
weights = random_state.rand(y)

for k in range(y):
    X_sum += np.dot(X.T[:, k + 1:],
                    X[:p - (k + 1), :]) * weights[k]
Run Code Online (Sandbox Code Playgroud)

这很好,并产生我期望的结果.然而,随着重复计算矩阵产品的尺寸ny增长(数百个),这变得非常昂贵......

然而,产品的计算方式有一个明显的模式:

第一次迭代

第二次迭代

您可以看到迭代进度,起始列切片Xt向右移动,而结束行X向上移动.这是第N次迭代的样子:

第N次迭代

这实际上意味着相同值的子集被重复相乘(参见编辑2),在我看来可能有机会利用...(即,如果我在一次通过中手动计算产品).

但是我希望不必手动做任何事情,并且可能有一种很好的方法来使用Numpy更优雅地实现整个循环.

编辑1

一组现实的数字:

n = 400
p = 2000
y = 750
Run Code Online (Sandbox Code Playgroud)

编辑2

要发表评论:

你能解释一下重复倍增的值吗?

考虑以下数组:

n = p = 5
X = np.arange(25).reshape(p, n)
Run Code Online (Sandbox Code Playgroud)

因为k=0,第一个产品将介于A和之间B:

k = 0
A = X.T[:, k + 1:]
B = X[:p - (k + 1), :]
>>> A
array([[ 5, 10, 15, 20],
       [ 6, 11, 16, 21],
       [ 7, 12, 17, 22],
       [ 8, 13, 18, 23],
       [ 9, 14, 19, 24]])
>>> B
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
Run Code Online (Sandbox Code Playgroud)

当时k=1:

k = 1
>>> A
array([[10, 15, 20],
       [11, 16, 21],
       [12, 17, 22],
       [13, 18, 23],
       [14, 19, 24]])
>>> B
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
Run Code Online (Sandbox Code Playgroud)

因此,如果有意义的话,每个后续矩阵产品都是先前产品的子集.

Cha*_*tar 4

太长了;我会选择 @Parfaittest_gen_sum基于np和的各种值的基准测试来使用y。为了连续性,保留旧答案

评估n, p,如何y影响算法的选择

此分析是使用 @Parfait 的函数作为确定是否确实存在一个最佳解决方案或是否存在基于np和 的值的一系列解决方案的方法来完成的y

import numpy as np
import pytest # This code also requires the pytest-benchmark plugin

def test_for_sum(n, p, y):
    random_state = np.random.RandomState(1)
    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    weights = random_state.rand(y)

    for k in range(y):
        X_sum += np.dot(X.T[:, k + 1:],
                    X[:p - (k + 1), :]) * weights[k]

    return X_sum


def test_list_sum(n, p, y):
    random_state = np.random.RandomState(1)

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    weights = random_state.rand(y)

    matrix_list = [np.dot(X.T[:, k + 1:],
                      X[:p - (k + 1), :]) * weights[k] for k in range(y)]

    X_sum = np.sum(matrix_list, axis=0)

    return X_sum


def test_reduce_sum(n, p, y):
    random_state = np.random.RandomState(1)

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    weights = random_state.rand(y)

    matrix_list = [(X.T[:, k + 1:] @
                X[:p - (k + 1), :]) * weights[k] for k in range(y)]

    X_sum = reduce(lambda x,y: x + y, matrix_list)

    return X_sum


def test_concat_sum(n, p, y):
    random_state = np.random.RandomState(1)

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    weights = random_state.rand(y)

    x_mat = np.concatenate([np.matmul(X.T[:, k + 1:],
                                  X[:p - (k + 1), :]) for k in range(y)])

    wgt_mat = np.concatenate([np.full((n,1), weights[k]) for k in range(y)])

    mul_res = x_mat * wgt_mat        
    X_sum = mul_res.reshape(-1, n, n).sum(axis=0)

    return X_sum


def test_matmul_sum(n, p, y):
    random_state = np.random.RandomState(1)
    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    weights = random_state.rand(y)
    # Use list comprehension and np.matmul 
    matrices_list = [np.matmul(X.T[:, k + 1:],
                           X[:p - (k + 1), :]) * weights[k] for k in range(y)]

    # Sum matrices in list of matrices to get the final result   
    X_sum = np.sum(matrices_list, axis=0)

    return X_sum


def test_gen_sum(n, p, y):
    random_state = np.random.RandomState(1)

    X = random_state.rand(p, n)
    X_sum = np.zeros((n, n))

    # The length of weights are not related to X's dims,
    # but will always be smaller
    weights = random_state.rand(y)

    matrix_gen = (np.dot(X.T[:, k + 1:],
                     X[:p - (k + 1), :]) * weights[k] for k in range(y))

    X_sum = sum(matrix_gen)

    return X_sum

parameters = [
    pytest.param(400, 800, 3)
    ,pytest.param(400, 2000, 3)
    ,pytest.param(400, 800, 750)
    ,pytest.param(400, 2000, 750)
]

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_for_sum(benchmark, n, p, y):
    benchmark(test_for_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_list_sum(benchmark, n, p, y):
     benchmark(test_list_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_reduce_sum(benchmark, n, p, y):
    benchmark(test_reduce_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_concat_sum(benchmark, n, p, y):
    benchmark(test_concat_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_matmul_sum(benchmark, n, p, y):
    benchmark(test_matmul_sum, n=n, p=p, y=y)

@pytest.mark.parametrize('n,p,y', parameters)
def test_test_gen_sum(benchmark, n, p, y):
    benchmark(test_gen_sum, n=n, p=p, y=y)
Run Code Online (Sandbox Code Playgroud)
  • n=400, p=800, y=3(100 次迭代)

    • 优胜者: test_gen_sum 在此输入图像描述
  • n=400, p=2000, y=3(100 次迭代)

    • 优胜者: test_gen_sum 在此输入图像描述
  • n=400, p=800, y=750(10 次迭代)

    • 优胜者: test_gen_sum 在此输入图像描述
  • n=400, p=2000, y=750(10 次迭代)

    • 优胜者: test_gen_sum 在此输入图像描述

旧答案

较小的y

我肯定会使用np.matmul它来代替它,np.dot这将为您带来最大的性能提升,事实上 的文档np.dot将指导您np.matmul使用 2D 数组乘法来代替np.dot.

我测试了有np.dotnp.matmul没有列表理解的情况,pytest-benchmark结果如下:

y=3

顺便说一句,pytest-benchmark非常灵活,我强烈推荐在这样的时候使用它来验证方法是否真正有效。

仅使用列表理解对结果的影响几乎可以忽略不计np.matmul,并且对事物方案的负面影响np.dot(尽管这是更好的形式),但是两种变化的结合产生了最好的结果。我警告说,使用列表理解往往会提高标准。开发人员。的运行时,因此您可能会看到运行时性能的范围比仅使用时更大np.matmul

这是代码:

import numpy as np

def test_np_matmul_list_comprehension():
    random_state = np.random.RandomState(1)
    n = p = 1000
    X = np.arange(n * n).reshape(p, n)

    # The length of weights are not related to X's dims,
    # but will always be smaller
    y = 3
    weights = [1, 1, 1]
    # Use list comprehension and np.matmul 
    matrices_list = [np.matmul(X.T[:, k + 1:],
                             X[:p - (k + 1), :]) * weights[k] for k in range(y)]

    # Sum matrices in list of matrices to get the final result   
    X_sum = np.sum(matrices_list, axis=0)
Run Code Online (Sandbox Code Playgroud)

更大的y

对于较大的值,y最好不要使用列表理解。在这两种情况下,np.dot平均/中位运行时间往往都较大。np.matmul以下是( , , )pytest-benchmark的结果:n=500p=5000y=750

在此输入图像描述

这可能有点矫枉过正,但我​​宁愿犯太有帮助的错误:)。