我正在研究并行编程概念,并尝试在单核上优化矩阵乘法示例。到目前为止,我想到的最快的实现是:
/* This routine performs a dgemm operation
* C := C + A * B
* where A, B, and C are lda-by-lda matrices stored in column-major format.
* On exit, A and B maintain their input values. */
void square_dgemm (int n, double* A, double* B, double* C)
{
/* For each row i of A */
for (int i = 0; i < n; ++i)
/* For each column j of B */
for (int j = …Run Code Online (Sandbox Code Playgroud) c c++ parallel-processing optimization matrix-multiplication