相关疑难解决方法(0)

如何优化矩阵乘法（matmul）代码以在单个处理器内核上快速运行

我正在研究并行编程概念，并尝试在单核上优化矩阵乘法示例。到目前为止，我想到的最快的实现是：

/* This routine performs a dgemm operation
 *  C := C + A * B
 * where A, B, and C are lda-by-lda matrices stored in column-major format.
 * On exit, A and B maintain their input values. */    
void square_dgemm (int n, double* A, double* B, double* C)
{
  /* For each row i of A */
  for (int i = 0; i < n; ++i)
    /* For each column j of B */
    for (int j = …

Run Code Online (Sandbox Code Playgroud)

c c++ parallel-processing optimization matrix-multiplication

Cha*_*blu

2017 10-04

6
推荐指数

2
解决办法

2175
查看次数