我已经在我的C++项目中添加了x64配置来编译我的应用程序的64位版本.一切看起来都很好,但编译器发出以下警告:
`cl : Command line warning D9002 : ignoring unknown option '/arch:SSE2'`
Run Code Online (Sandbox Code Playgroud)
SSE2优化真的不适用于64位项目吗?
我正在寻找在SSE元素上运算的指数函数的近似值.即 - __m128 exp( __m128 x ).
我有一个快速但实际上准确度非常低的实现:
static inline __m128 FastExpSse(__m128 x)
{
__m128 a = _mm_set1_ps(12102203.2f); // (1 << 23) / ln(2)
__m128i b = _mm_set1_epi32(127 * (1 << 23) - 486411);
__m128 m87 = _mm_set1_ps(-87);
// fast exponential function, x should be in [-87, 87]
__m128 mask = _mm_cmpge_ps(x, m87);
__m128i tmp = _mm_add_epi32(_mm_cvtps_epi32(_mm_mul_ps(a, x)), b);
return _mm_and_ps(_mm_castsi128_ps(tmp), mask);
}
Run Code Online (Sandbox Code Playgroud)
任何人都可以以更快的速度(或更快)获得更高精度的实现吗?
如果我用C风格写的话,我会很高兴的.
谢谢.
我正在制作一个基本上利用SSE2优化此代码的代码:
double *pA = a;
double *pB = b[voiceIndex];
double *pC = c[voiceIndex];
for (int sampleIndex = 0; sampleIndex < blockSize; sampleIndex++) {
pC[sampleIndex] = exp((mMin + std::clamp(pA[sampleIndex] + pB[sampleIndex], 0.0, 1.0) * mRange) * ln2per12);
}
Run Code Online (Sandbox Code Playgroud)
在这:
double *pA = a;
double *pB = b[voiceIndex];
double *pC = c[voiceIndex];
// SSE2
__m128d bound_lower = _mm_set1_pd(0.0);
__m128d bound_upper = _mm_set1_pd(1.0);
__m128d rangeLn2per12 = _mm_set1_pd(mRange * ln2per12);
__m128d minLn2per12 = _mm_set1_pd(mMin * ln2per12);
__m128d loaded_a = _mm_load_pd(pA);
__m128d loaded_b = _mm_load_pd(pB); …Run Code Online (Sandbox Code Playgroud)