相关疑难解决方法(0)

为什么天真的C++矩阵乘法比BLAS慢100倍?

我正在研究大型矩阵乘法并运行以下实验来形成基线测试:

  1. 从std normal(0 mean,1 stddev)随机生成两个4096x4096矩阵X,Y.
  2. Z = X*Y.
  3. Z的Sum元素(以确保它们被访问)和输出.

这是天真的C++实现:

#include <iostream>
#include <algorithm>

using namespace std;

int main()
{
    constexpr size_t dim = 4096;

    float* x = new float[dim*dim];
    float* y = new float[dim*dim];
    float* z = new float[dim*dim];

    random_device rd;
    mt19937 gen(rd());
    normal_distribution<float> dist(0, 1);

    for (size_t i = 0; i < dim*dim; i++)
    {
        x[i] = dist(gen);
        y[i] = dist(gen);
    }

    for (size_t row = 0; row < dim; row++)
        for (size_t col = 0; col < …
Run Code Online (Sandbox Code Playgroud)

c++ linux matlab matrix-multiplication c++11

12
推荐指数
2
解决办法
3347
查看次数

标签 统计

c++ ×1

c++11 ×1

linux ×1

matlab ×1

matrix-multiplication ×1