小编jmh*_*jmh的帖子

使用NEON优化Cortex-A8颜色转换

我目前正在进行颜色转换例程,以便从YUY2转换为NV12.我有一个非常快的功能,但没有我想象的那么快,主要是由于缓存未命中.

void convert_hd(uint8_t *orig, uint8_t *result) {
uint32_t width          = 1280;
uint32_t height         = 720;
uint8_t *lineOdd        = orig;
uint8_t *lineEven       = orig + width*2;
uint8_t *resultYOdd     = result;
uint8_t *resultYEven    = result + width;
uint8_t *resultUV       = result + height*width;
uint32_t totalLoop      = height/2;

while (totalLoop-- > 0) {
  uint32_t lineLoop = 1280/32; // Bytes length: width*2, read by iter 16Bytes

  while(lineLoop-- > 0) {
    __asm__ __volatile__(
        "pld [%[lineOdd]]   \n\t"
        "vld4.8   {d0, d1, d2, d3}, [%[lineOdd],:128]!   \n\t" // d0:Y …
Run Code Online (Sandbox Code Playgroud)

assembly arm neon cortex-a8 cpu-cache

5
推荐指数
1
解决办法
395
查看次数

标签 统计

arm ×1

assembly ×1

cortex-a8 ×1

cpu-cache ×1

neon ×1