SSE2指令以相反的顺序加载整数

And*_*ndy 8 x86 sse simd sse2

是否有任何SSE2指令以相反的顺序intint缓冲区加载128位向量寄存器?

Pau*_*l R 10

int在正常加载后反转32位元素非常容易:

__m128i v = _mm_load_si128(buff);                    // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3));   // PSHUFD  - mask = 00 01 10 11 = 0x1b
Run Code Online (Sandbox Code Playgroud)

您可以对16位short元素执行相同的操作,但需要更多指令:

__m128i v = _mm_load_si128(buff);                    // MOVDQA
v = _mm_shuffle_epi32(v, _MM_SHUFFLE(0, 1, 2, 3));   // PSHUFD  - mask = 00 01 10 11 = 0x1b
v = _mm_shufflelo_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFLW - mask = 10 11 00 01 = 0xb1
v = _mm_shufflehi_epi16(v, _MM_SHUFFLE(2, 3, 0, 1)); // PSHUFHW - mask = 10 11 00 01 = 0xb1
Run Code Online (Sandbox Code Playgroud)

请注意,如果SSSE3可用,则可以使用较少的指令_mm_shuffle_epi8 (PSHUFB)执行此操作:

const __m128i vm = _mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1);
                                     // initialise vector mask for use with PSHUFB
                                     // NB: do this once, outside any processing loop
...
__m128i v = _mm_load_si128(buff);    // MOVDQA
v = _mm_shuffle_epi8(v, vm);         // PSHUFB
Run Code Online (Sandbox Code Playgroud)

  • 除非SSSE3不可用,否则我会使用PSHUFB来反转短路向量. (2认同)