相关疑难解决方法(0)

    unsigned long long char_count_AVX2(char * vector, int size, char c){
    unsigned long long sum =0;
    int i, j;
    const int con=3;
    __m256i ans[con];
    for(i=0; i<con; i++)
        ans[i]=_mm256_setzero_si256();

    __m256i Zer=_mm256_setzero_si256();
    __m256i C=_mm256_set1_epi8(c);
    __m256i Assos=_mm256_set1_epi8(0x01);
    __m256i FF=_mm256_set1_epi8(0xFF);
    __m256i shield=_mm256_set1_epi8(0xFF);
    __m256i temp;
    int couter=0;
    for(i=0; i<size; i+=32){
        couter++;
        shield=_mm256_xor_si256(_mm256_cmpeq_epi8(ans[0], Zer), FF);
        temp=_mm256_cmpeq_epi8(C, *((__m256i*)(vector+i)));
        temp=_mm256_xor_si256(temp, FF);
        temp=_mm256_add_epi8(temp, Assos);
        ans[0]=_mm256_add_epi8(temp, ans[0]);
        for(j=1; j<con; j++){
            temp=_mm256_cmpeq_epi8(ans[j-1], Zer);
            shield=_mm256_and_si256(shield, temp);
            temp=_mm256_xor_si256(shield, FF);
            temp=_mm256_add_epi8(temp, Assos);
            ans[j]=_mm256_add_epi8(temp, ans[j]);
        }
    }
    for(j=con-1; j>=0; …

Run Code Online (Sandbox Code Playgroud)

c simd avx avx2

Ada*_*468

2023 06-11

8
推荐指数

1
解决办法

3814
查看次数

绩效评估的惯用方法？

我正在评估我的项目的网络+渲染工作负载。

程序连续运行一个主循环：

while (true) {
   doSomething()
   drawSomething()
   doSomething2()
   sendSomething()
}

Run Code Online (Sandbox Code Playgroud)

主循环每秒运行 60 多次。

我想查看性能故障，每个程序需要多少时间。

我担心的是，如果我打印每个程序的每个入口和出口的时间间隔，

这会导致巨大的性能开销。

我很好奇什么是衡量性能的惯用方法。

日志打印是否足够好？

benchmarking microbenchmark

shp*_*ark

lucky-day

1
推荐指数

1
解决办法

1322
查看次数

标签统计

algorithm ×1

assembly ×1

avx ×1

avx2 ×1

benchmarking ×1

binary ×1

bit-manipulation ×1

c ×1

floating-point ×1

hammingweight ×1

iec10967 ×1

microbenchmark ×1

optimization ×1

simd ×1

sse ×1

x86 ×1

如何计算32位整数中的设置位数？

在x86上做水平浮点矢量和的最快方法

如何使用 SIMD 计算字符出现次数

绩效评估的惯用方法？

标签 统计

标签统计