我编写了SSE代码来总结字节值.(VS2005).
因为它很简单,它运行得很好(而且速度很快).只有一些大小的数组崩溃.它只在发布模式下崩溃 - 在调试中永远不会.也许有人看到了"明显的"错误?任何帮助赞赏.
__int64 Sum (const unsigned char* pData, const unsigned int& nLength)
{
__int64 nSum (0);
__m128i* pp = (__m128i*)pData;
ATLASSERT( ( (DWORD)pp & 15 ) == 0 ); // pointer must point to address multiple of 16 (cache line)
__m128i zero = _mm_setzero_si128(),
a, b, c, d, tmp;
unsigned int i (0);
for ( ; i < nLength; i+=64) // 4-fach loop-unroll (x 16)
{
a = _mm_sad_epu8( *(pp++), zero);
b = _mm_sad_epu8( *(pp++), zero); // It …Run Code Online (Sandbox Code Playgroud)