我有这个程序,我正在为学校工作,其目的是将两个矩阵相加并将它们的结果存储在第三个矩阵中。目前,与驱动程序一起运行时的指令数为1,003,034,420条(即.o文件),但需要在10亿条以下。但是,我不知道该怎么做,因为我已经考虑了我使用的所有说明,并且所有这些说明似乎都是让程序正常工作的强制性要求。
请注意,此时我无法减少循环展开的指令数量,因为稍后会出现。
这是程序:
/* This function has 5 parameters, and the declaration in the
C-language would look like:
void matadd (int **C, int **A, int **B, int height, int width)
C, A, B, and height will be passed in r0-r3, respectively, and
width will be passed on the stack. */
.arch armv7-a
.text
.align 2
.global matadd
.syntax unified
.arm
matadd:
push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
ldr r4, [sp, #36] @ load width into r4
mov r5, #0 @ r5 is current row index
row_loop:
mov r6, #0 @ r6 is the col, reset it for each new row
cmp r5, r3 @ compare row with height
beq end_loops @ we have finished all of the rows
ldr r11, [r0, r5, lsl #2] @ r11 is the current row array of C
ldr r7, [r1, r5, lsl #2] @ r7 is the current row array of A
ldr r8, [r2, r5, lsl #2] @ r8 is the current row array of B
@ the left shifts are so that we skip
@ 4 bytes since these are ints
@ these do not change registers
col_loop:
cmp r6, r4 @ compare col with width
beq end_col @ we have finished this col
ldr r9, [r7, r6, lsl #2] @ r9 is cur_row[col] of A
ldr r10, [r8, r6, lsl #2] @ r10 is cur_row[col] of B
add r9, r9, r10 @ r8 is A[row][col] + B[row][col]
str r9, [r11, r6, lsl #2] @ store result of addition in C[row][col]
add r6, r6, #1 @ increment col
b col_loop @ get next entry
end_col:
add r5, r5, #1 @ increment row
b row_loop @ get next row
end_loops:
pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
Run Code Online (Sandbox Code Playgroud)
我以为一定有一些指令将 cmp 和 b 结合起来,但我似乎找不到它。关于如何减少指令数量的任何指示?
您想从内部循环中删除无条件分支。
loop_start:
cmp x, y
beq loop_exit
blah blah blah
b loop_start
loop_exit:
Run Code Online (Sandbox Code Playgroud)
请注意,每次通过循环时,都会有一个无条件分支 ( b loop_start)。通过内联分支目标直到下一个条件分支来避免分支。
loop_start:
cmp x, y
beq loop_exit
loop_middle:
blah blah blah
; was "b loop_start" but we just copy the instructions
; starting at "loop_start" up to the conditional branch
cmp x, y
beq loop_exit
; and then jump to the instruction after the inlined portion
b loop_middle
loop_exit:
Run Code Online (Sandbox Code Playgroud)
此时,thebeq只是一个分支上的一个分支,因此可以用一个反向分支替换它。
loop_start:
cmp x, y
beq loop_exit
loop_middle:
blah blah blah
cmp x, y
; "beq loop_exit" followed by "b loop_middle" is equivalent to this
bne loop_middle
loop_exit:
Run Code Online (Sandbox Code Playgroud)
在您的代码中有两种优化机会。
(不要忘记在提交解决方案时引用此网页,以避免学术不诚实的指控。)
| 归档时间: |
|
| 查看次数: |
202 次 |
| 最近记录: |