相关疑难解决方法(0)

intn = 0;
for(int32_t i=0; i<maxiter; i++) {
    floatn x2 = square(x), y2 = square(y); //square(x) = x*x
    floatn r2 = x2 + y2;
    booln mask = r2<cut; //booln is in the float domain non integer domain
    if(!horizontal_or(mask)) break; //_mm256_testz_pd(mask)
    n -= mask
    floatn t = x*y; mul2(t); //mul2(t): t*=2
    x = x2 - y2 + cx;
    y = t + cy;
}

Run Code Online (Sandbox Code Playgroud)

这确定n像素是否在Mandelbrot集中.因此对于双浮点,它运行超过4个像素(floatn = __m256d,intn = __m256i).这需要4个SIMD浮点乘法和4个SIMD浮点加法.

然后我修改了这个就像这样使用FMA

intn n = 0; …

Run Code Online (Sandbox Code Playgroud)

floating-point x86 assembly mandelbrot fma

Z b*_*son

2015 06-02

9
推荐指数

1
解决办法

944
查看次数

C内存分配器和严格别名

即使在阅读了相当多的严格别名规则后,我仍然感到困惑.据我所知,不可能实现遵循这些规则的合理的内存分配器,因为malloc永远不能重用释放的内存,因为内存可以用于在每次分配时存储不同的类型.

显然这不可能是正确的.我错过了什么？如何实现遵循严格别名的分配器(或内存池)？

谢谢.

编辑:让我用一个愚蠢的简单例子来澄清我的问题:

// s == 0 frees the pool
void *my_custom_allocator(size_t s) {
    static void *pool = malloc(1000);
    static int in_use = FALSE;
    if( in_use || s > 1000 ) return NULL;
    if( s == 0 ) {
        in_use = FALSE;
        return NULL;
    }
    in_use = TRUE;
    return pool;
}

main() {
    int *i = my_custom_allocator(sizeof(int));
    //use int
    my_custom_allocator(0);
    float *f = my_custom_allocator(sizeof(float)); //not allowed...
}

Run Code Online (Sandbox Code Playgroud)

c memory-management strict-aliasing language-lawyer

Seb*_*nde

2011 12-13

8
推荐指数

1
解决办法

2472
查看次数

ldexp的目的是什么？

我想知道人们在实际应用中使用函数ldexp()是什么.

这是描述:

返回将x(有效数字)乘以2乘以exp(指数)的幂的结果.

c++

Blu*_*rin

lucky-day

7
推荐指数

1
解决办法

726
查看次数

为什么 GCC 和 Clang 不使用指数的浮点到整数 PADDD 来优化乘以 2^n 的乘法，即使使用 -ffast-math 也是如此？

考虑到这个功能，

float mulHalf(float x) {
    return x * 0.5f;
}

Run Code Online (Sandbox Code Playgroud)

以下函数与正常输入/输出产生相同的结果。

float mulHalf_opt(float x) {
    __m128i e = _mm_set1_epi32(-1 << 23);
    __asm__ ("paddd\t%0, %1" : "+x"(x) : "xm"(e));
    return x;
}

Run Code Online (Sandbox Code Playgroud)

这是带有的汇编输出-O3 -ffast-math。

mulHalf:
        mulss   xmm0, DWORD PTR .LC0[rip]
        ret

mulHalf_opt:
        paddd   xmm0, XMMWORD PTR .LC1[rip]
        ret

Run Code Online (Sandbox Code Playgroud)

-ffast-math启用-ffinite-math-only“假设参数和结果不是 NaN 或 +-Infs” [1]。

因此，如果在的容差下生成更快的代码，则的编译输出可能会更好地与onmulHalf一起使用。paddd-ffast-math-ffast-math

我从Intel Intrinsics Guide中获得了下表。

(MULSS)
Architecture    Latency Throughput (CPI)
Skylake         4       0.5 …

Run Code Online (Sandbox Code Playgroud)

c floating-point x86 assembly compiler-optimization

xiv*_*r77

2022 05-28

5
推荐指数

1
解决办法

253
查看次数

标签统计

c ×3

assembly ×2

c++ ×2

compiler-optimization ×2

floating-point ×2

optimization ×2

performance ×2

x86 ×2

fma ×1

language-agnostic ×1

language-lawyer ×1

mandelbrot ×1

memory-management ×1

strict-aliasing ×1

最后的性能优化策略

为什么编译器不将浮点*2优化为指数增量？

优化快速乘法但缓慢添加:FMA和doubledouble

C内存分配器和严格别名

ldexp的目的是什么？

为什么 GCC 和 Clang 不使用指数的浮点到整数 PADDD 来优化乘以 2^n 的乘法，即使使用 -ffast-math 也是如此？

标签 统计

标签统计