我正在尝试加速多核架构上的矩阵乘法.为此,我尝试同时使用线程和SIMD.但我的结果并不好.我通过顺序矩阵乘法测试加速:
void sequentialMatMul(void* params)
{
cout << "SequentialMatMul started.";
int i, j, k;
for (i = 0; i < N; i++)
{
for (k = 0; k < N; k++)
{
for (j = 0; j < N; j++)
{
X[i][j] += A[i][k] * B[k][j];
}
}
}
cout << "\nSequentialMatMul finished.";
}
Run Code Online (Sandbox Code Playgroud)
我尝试将线程和SIMD添加到矩阵乘法中,如下所示:
void threadedSIMDMatMul(void* params)
{
bounds *args = (bounds*)params;
int lowerBound = args->lowerBound;
int upperBound = args->upperBound;
int idx = args->idx;
int i, j, k;
for (i …Run Code Online (Sandbox Code Playgroud)