SSE代码根据比较将float变量设置为0.0f或1.0f

Question

SSE代码根据比较将float变量设置为0.0f或1.0f

use*_*436 8 c optimization performance sse simd

我有两个数组:char* c并且float* f我需要执行此操作:

// Compute float mask
float* f;
char* c;
char c_thresh;
int n;

for ( int i = 0; i < n; ++i )
{
    if ( c[i] < c_thresh ) f[i] = 0.0f;
    else                   f[i] = 1.0f;
}

Run Code Online (Sandbox Code Playgroud)

我正在寻找一种快速的方法:没有条件,如果可能的话使用SSE(4.2或AVX).

如果使用float而不是char可以导致更快的代码,我可以更改我的代码只使用浮点数:

// Compute float mask
float* f;
float* c;
float c_thresh;
int n;

for ( int i = 0; i < n; ++i )
{
    if ( c[i] < c_thresh ) f[i] = 0.0f;
    else                   f[i] = 1.0f;
}

Run Code Online (Sandbox Code Playgroud)

谢谢

Answer 1

har*_*old 5

非常简单,只需进行比较,将字节转换为dword,并使用1.0f :(未经过测试,无论如何这并不意味着复制和粘贴代码,它的目的是展示你是如何做到的)

movd xmm0, [c]          ; read 4 bytes from c
pcmpgtb xmm0, threshold ; compare (note: comparison is >, not >=, so adjust threshold)
pmovzxbd xmm0, xmm0     ; convert bytes to dwords
pand xmm0, one          ; AND all four elements with 1.0f
movdqa [f], xmm0        ; save result

Run Code Online (Sandbox Code Playgroud)

应该很容易转换为内在函数.

Answer 2

ana*_*lyg 5

以下代码使用SSE2(我认为).

它在一条指令(_mm_cmpgt_epi8)中执行16次字节比较.它假设char已签名; 如果你char是无符号的,它需要额外的摆弄(翻转每个的最重要部分char).

它唯一的非标准用法是使用幻数3f80来表示浮点常数1.0.实际上是神奇的数字0x3f800000,但是16 LSB为零的事实使得可以更有效地进行比特摆动(使用16位掩码而不是32位掩码).

// load (assuming the pointer is aligned)
__m128i input = *(const __m128i*)c;
// compare
__m128i cmp = _mm_cmpgt_epi8(input, _mm_set1_epi8(c_thresh - 1));
// convert to 16-bit
__m128i c0 = _mm_unpacklo_epi8(cmp, cmp);
__m128i c1 = _mm_unpackhi_epi8(cmp, cmp);
// convert ffff to 3f80
c0 = _mm_and_si128(c0, _mm_set1_epi16(0x3f80));
c1 = _mm_and_si128(c1, _mm_set1_epi16(0x3f80));
// convert to 32-bit and write (assuming the pointer is aligned)
__m128i* result = (__m128i*)f;
result[0] = _mm_unpacklo_epi16(_mm_setzero_si128(), c0);
result[1] = _mm_unpackhi_epi16(_mm_setzero_si128(), c0);
result[2] = _mm_unpacklo_epi16(_mm_setzero_si128(), c1);
result[3] = _mm_unpackhi_epi16(_mm_setzero_si128(), c1);

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，3 月前
查看次数：	737 次
最近记录：	12 年，3 月前