减少 arm 中的指令数

Question

减少 arm 中的指令数

我有这个程序，我正在为学校工作，其目的是将两个矩阵相加并将它们的结果存储在第三个矩阵中。目前，与驱动程序一起运行时的指令数为1,003,034,420条（即.o文件），但需要在10亿条以下。但是，我不知道该怎么做，因为我已经考虑了我使用的所有说明，并且所有这些说明似乎都是让程序正常工作的强制性要求。

请注意，此时我无法减少循环展开的指令数量，因为稍后会出现。

这是程序：

/* This function has 5 parameters, and the declaration in the
   C-language would look like:

   void matadd (int **C, int **A, int **B, int height, int width)

   C, A, B, and height will be passed in r0-r3, respectively, and
   width will be passed on the stack. */

.arch armv7-a
.text
.align  2
.global matadd
.syntax unified
.arm
matadd:
   push  {r4, r5, r6, r7, r8, r9, r10, r11, lr}
   ldr   r4, [sp, #36]                 @ load width into r4
   mov   r5, #0                        @ r5 is current row index
row_loop: 
   mov   r6, #0                        @ r6 is the col, reset it for each new row
   cmp   r5, r3                        @ compare row with height
   beq   end_loops                     @ we have finished all of the rows
   ldr   r11, [r0, r5, lsl #2]         @ r11 is the current row array of C
   ldr   r7, [r1, r5, lsl #2]          @ r7 is the current row array of A
   ldr   r8, [r2, r5, lsl #2]          @ r8 is the current row array of B
                                       @ the left shifts are so that we skip
                                       @ 4 bytes since these are ints
                                       @ these do not change registers
col_loop:   
   cmp   r6, r4                        @ compare col with width
   beq   end_col                       @ we have finished this col
   ldr   r9, [r7, r6, lsl #2]          @ r9 is cur_row[col] of A
   ldr   r10, [r8, r6, lsl #2]         @ r10 is cur_row[col] of B
   add   r9, r9, r10                   @ r8 is A[row][col] + B[row][col]
   str   r9, [r11, r6, lsl #2]         @ store result of addition in C[row][col]
   add   r6, r6, #1                    @ increment col
   b     col_loop                      @ get next entry
end_col:
   add   r5, r5, #1                    @ increment row
   b     row_loop                      @ get next row
end_loops:   
   pop   {r4, r5, r6, r7, r8, r9, r10, r11, pc}

Run Code Online (Sandbox Code Playgroud)

我以为一定有一些指令将 cmp 和 b 结合起来，但我似乎找不到它。关于如何减少指令数量的任何指示？

Answer 1

Ray*_*hen 5

您想从内部循环中删除无条件分支。

loop_start:
    cmp x, y
    beq loop_exit

    blah blah blah

    b loop_start
loop_exit:

Run Code Online (Sandbox Code Playgroud)

请注意，每次通过循环时，都会有一个无条件分支 ( b loop_start)。通过内联分支目标直到下一个条件分支来避免分支。

loop_start:
    cmp x, y
    beq loop_exit

loop_middle:
    blah blah blah

    ; was "b loop_start" but we just copy the instructions
    ; starting at "loop_start" up to the conditional branch

    cmp x, y
    beq loop_exit

    ; and then jump to the instruction after the inlined portion
    b loop_middle
loop_exit:

Run Code Online (Sandbox Code Playgroud)

此时，thebeq只是一个分支上的一个分支，因此可以用一个反向分支替换它。

loop_start:
    cmp x, y
    beq loop_exit

loop_middle:
    blah blah blah

    cmp x, y

    ; "beq loop_exit" followed by "b loop_middle" is equivalent to this
    bne loop_middle

loop_exit:

Run Code Online (Sandbox Code Playgroud)

在您的代码中有两种优化机会。

（不要忘记在提交解决方案时引用此网页，以避免学术不诚实的指控。）

归档时间：	7 年，9 月前
查看次数：	202 次
最近记录：	7 年，9 月前