我想优化这个简单的循环:
unsigned int i;
while(j-- != 0){ //j is an unsigned int with a start value of about N = 36.000.000
float sub = 0;
i=1;
unsigned int c = j+s[1];
while(c < N) {
sub += d[i][j]*x[c];//d[][] and x[] are arrays of float
i++;
c = j+s[i];// s[] is an array of unsigned int with 6 entries.
}
x[j] -= sub; // only one memory-write per j
}
Run Code Online (Sandbox Code Playgroud)
使用4000 MHz AMD Bulldozer,该循环的执行时间约为1秒.我想过SIMD和OpenMP(我通常用它来获得更快的速度),但这个循环是递归的.
有什么建议?