6 c++ arm image-processing simd neon
我开发了图像处理算法(使用GCC,针对ARMv7(Raspberry Pi 2B)).
特别是我使用一个简单的算法,它改变了掩码中的索引:
void ChangeIndex(uint8_t * mask, size_t size, uint8_t oldIndex, uint8_t newIndex)
{
for(size_t i = 0; i < size; ++i)
{
if(mask[i] == oldIndex)
mask[i] = newIndex;
}
}
Run Code Online (Sandbox Code Playgroud)
不幸的是,它对目标平台的性能很差.
有没有办法优化它?
Erm*_*mIg 13
ARMv7平台支持称为NEON的SIMD指令.使用它们可以让您更快地编写代码:
#include <arm_neon.h>
void ChangeIndex(uint8_t * mask, size_t size, uint8_t oldIndex, uint8_t newIndex)
{
size_t alignedSize = size/16*16, i = 0;
uint8x16_t _oldIndex = vdupq_n_u8(oldIndex);
uint8x16_t _newIndex = vdupq_n_u8(newIndex);
for(; i < alignedSize; i += 16)
{
uint8x16_t oldMask = vld1q_u8(mask + i); // loading of 128-bit vector
uint8x16_t condition = vceqq_u8(oldMask, _oldIndex); // compare two 128-bit vectors
uint8x16_t newMask = vbslq_u8(condition, _newIndex, oldMask); // selective copying of 128-bit vector
vst1q_u8(mask + i, newMask); // saving of 128-bit vector
}
for(; i < size; ++i)
{
if(mask[i] == oldIndex)
mask[i] = newIndex;
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
650 次 |
| 最近记录: |