小编Aqu*_*ine的帖子

Fast popcount on Intel Xeon Phi

I'm implementing an ultra fast popcount on Intel Xeon® Phi®, as it's a performance hotspot of various bioinformatics software.

I've implemented five pieces of code,

#if defined(__MIC__)
#include <zmmintrin.h>
__attribute__((align(64))) static const uint32_t POPCOUNT_4bit[16] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};
__attribute__((align(64))) static const uint32_t MASK_4bit[16] = {0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF, 0xF};
inline uint64_t vpu_popcount1(uint64_t* buf, size_t n)  { …
Run Code Online (Sandbox Code Playgroud)

c vectorization hammingweight intel-mic xeon-phi

9
推荐指数
1
解决办法
1749
查看次数

标签 统计

c ×1

hammingweight ×1

intel-mic ×1

vectorization ×1

xeon-phi ×1