相关疑难解决方法(0)

优化快速乘法但缓慢添加:FMA和doubledouble

当我第一次使用Haswell处理器时,我尝试使用FMA来确定Mandelbrot集.主要算法是这样的:

intn = 0;
for(int32_t i=0; i<maxiter; i++) {
    floatn x2 = square(x), y2 = square(y); //square(x) = x*x
    floatn r2 = x2 + y2;
    booln mask = r2<cut; //booln is in the float domain non integer domain
    if(!horizontal_or(mask)) break; //_mm256_testz_pd(mask)
    n -= mask
    floatn t = x*y; mul2(t); //mul2(t): t*=2
    x = x2 - y2 + cx;
    y = t + cy;
}

Run Code Online (Sandbox Code Playgroud)

这确定n像素是否在Mandelbrot集中.因此对于双浮点,它运行超过4个像素(floatn = __m256d,intn = __m256i).这需要4个SIMD浮点乘法和4个SIMD浮点加法.

然后我修改了这个就像这样使用FMA

intn n = 0; …

Run Code Online (Sandbox Code Playgroud)

floating-point x86 assembly mandelbrot fma

Z b*_*son

2015 06-02

9
推荐指数

1
解决办法

944
查看次数

有符号零的最小值和最大值

我担心以下情况

min(-0.0,0.0)
max(-0.0,0.0)
minmag(-x,x) 
maxmag(-x,x)

Run Code Online (Sandbox Code Playgroud)

据维基百科IEEE 754-2008称,关于min和max

定义了最小和最大操作,但是对于输入值相等但表示不同的情况留有一些余地.特别是:

min(+ 0,-0)或min(-0,+ 0)必须产生值为零的东西,但可能总是返回第一个参数.

我做了一些测试比较fmin,fmax,最小值和最大值定义见下文

#define max(a,b) \
   ({ __typeof__ (a) _a = (a); \
       __typeof__ (b) _b = (b); \
     _a > _b ? _a : _b; })
#define min(a,b) \
   ({ __typeof__ (a) _a = (a); \
       __typeof__ (b) _b = (b); \
     _a < _b ? _a : _b; })

Run Code Online (Sandbox Code Playgroud)

并_mm_min_ps和_mm_max_ps其称之为SSE minps和maxps指令.

以下是结果(我用来测试的代码发布在下面)