我正在尝试使用simd内部函数在C中编程矩阵乘法。我非常确定自己的实现,但是执行时,我会从所得矩阵系数的第5位开始出现一些数字错误。
REAL_T只是具有typedef的浮点数
/* This is my matmul Version with simd, using floating simple precision*/
void matmul(int n, REAL_T *A, REAL_T *B, REAL_T *C){
int i,j,k;
__m256 vA, vB, vC, vRes;
for (i=0; i<n; i++){
for (j=0; j<n; j++){
for (k=0; k<n; k= k+8){
vA = _mm256_load_ps(&A[i*n+k]);
vB = _mm256_loadu_ps(&B[k*n+j]);
vC = _mm256_mul_ps(vA, vB);
vC = _mm256_hadd_ps(vC, vC);
vC = _mm256_hadd_ps(vC, vC);
/*To get the resulting coefficient, after doing 2 hadds,
I have to get the first and the last element …Run Code Online (Sandbox Code Playgroud)