为什么用 Numpy 的矩阵乘法比用gsl_blas_sgemmGSL快得多,例如:
import numpy as np
import time
N = 1000
M = np.zeros(shape=(N, N), dtype=np.float)
for i in range(N):
for j in range(N):
M[i, j] = 0.23 + 100*i + j
tic = time.time()
np.matmul(M, M)
toc = time.time()
print(toc - tic)
Run Code Online (Sandbox Code Playgroud)
给出 0.017 - 0.019 秒之间的值,而在 C++ 中:
#include <chrono>
#include <iostream>
#include <gsl/gsl_matrix.h>
#include <gsl/gsl_blas.h>
using namespace std::chrono;
int main(void) {
int N = 1000;
gsl_matrix_float* M = gsl_matrix_float_alloc(N, N);
for (int i = …Run Code Online (Sandbox Code Playgroud)