相关疑难解决方法(0)

INC指令与ADD 1:重要吗？

来自Ira Baxter回答,为什么INC和DEC指令不会影响进位标志(CF)？

大多数情况下,我远离INC而DEC现在,因为他们做的部分条件代码更新,这样就可以在管道中引起滑稽的摊位,和ADD/ SUB没有.因此,无关紧要(大多数地方),我使用ADD/ SUB避免失速.我使用INC/ DEC仅在保持代码较小的情况下,例如,适合高速缓存行,其中一个或两个指令的大小产生足够的差异.这可能是毫无意义的纳米[字面意思!] - 优化,但我在编码习惯上相当老派.

我想问一下为什么它会导致管道中的停顿,而添加不会？毕竟,无论是ADD和INC更新标志寄存器.唯一的区别是INC不更新CF.但为什么重要呢？

performance x86 assembly increment micro-optimization

Gil*_*esz

2018 04-17

26
推荐指数

2
解决办法

4234
查看次数

函数调用循环比空循环快

我将一些程序集与一些c链接起来测试函数调用的成本,使用以下程序集和c源代码(分别使用fasm和gcc)

部件:

format ELF

public no_call as "_no_call"
public normal_call as "_normal_call"

section '.text' executable

iter equ 100000000

no_call:
    mov ecx, iter
@@:
    push ecx
    pop ecx
    dec ecx
    cmp ecx, 0
    jne @b
    ret

normal_function:
    ret

normal_call:
    mov ecx, iter
@@:
    push ecx
    call normal_function
    pop ecx
    dec ecx
    cmp ecx, 0
    jne @b
    ret

Run Code Online (Sandbox Code Playgroud)

c来源:

#include <stdio.h>
#include <time.h>

extern int no_call();
extern int normal_call();

int main()
{
    clock_t ct1, ct2;

    ct1 = clock();
    no_call();
    ct2 = clock(); …

Run Code Online (Sandbox Code Playgroud)

c performance x86 assembly fasm

rtp*_*pax

2018 03-09

15
推荐指数

1
解决办法

969
查看次数

添加冗余分配可在编译时加速代码而无需优化

我发现了一个有趣的现象:

#include<stdio.h>
#include<time.h>

int main() {
    int p, q;
    clock_t s,e;
    s=clock();
    for(int i = 1; i < 1000; i++){
        for(int j = 1; j < 1000; j++){
            for(int k = 1; k < 1000; k++){
                p = i + j * k;
                q = p;  //Removing this line can increase running time.
            }
        }
    }
    e = clock();
    double t = (double)(e - s) / CLOCKS_PER_SEC;
    printf("%lf\n", t);
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

我在i5-5257U Mac OS上使用GCC 7.3.0来编译代码 …

performance x86 assembly

hel*_*qiu

2018 03-10

3
推荐指数

1
解决办法

627
查看次数

从寄存器移动到频繁访问的变量时性能意外降低

我正在使用以下示例了解缓存的工作原理：

#include <stdio.h>\n#include <stdint.h>\n#include <stdlib.h>\n\ntypedef uint32_t data_t;\nconst int U = 10000000;   // size of the array. 10 million vals ~= 40MB\nconst int N = 100000000;  // number of searches to perform\n\nint main() {\n  data_t* data = (data_t*) malloc(U * sizeof(data_t));\n  if (data == NULL) {\n    free(data);\n    printf("Error: not enough memory\\n");\n    exit(-1);\n  }\n\n  // fill up the array with sequential (sorted) values.\n  int i;\n  for (i = 0; i < U; i++) {\n    data[i] = i;\n  }\n\n  printf("Allocated array of …

Run Code Online (Sandbox Code Playgroud)

c assembly caching x86-64 perf

Ste*_*Mai

2023 07-27

3
推荐指数

1
解决办法

138
查看次数