我正在寻找高效的AVX(AVX512)实现
// Given
float u[8];
float v[8];
// Compute
float a[8];
float b[8];
// Such that
for ( int i = 0; i < 8; ++i )
{
a[i] = fabs(u[i]) >= fabs(v[i]) ? u[i] : v[i];
b[i] = fabs(u[i]) < fabs(v[i]) ? u[i] : v[i];
}
Run Code Online (Sandbox Code Playgroud)
也就是说,我需要选择逐个元素为a从u和v基础mask,并为b基于!mask,在mask = (fabs(u) >= fabs(v))逐元素.