相关疑难解决方法(0)

用于测试Collatz猜想的C++代码比手写程序集更快 - 为什么？

我为Project Euler Q14编写了这两个解决方案,在汇编和C++中.它们是用于测试Collatz猜想的相同蛮力方法.装配解决方案与组装

nasm -felf64 p14.asm && gcc p14.o -o p14

Run Code Online (Sandbox Code Playgroud)

C++是用.编译的

g++ p14.cpp -o p14

Run Code Online (Sandbox Code Playgroud)

部件, p14.asm

section .data
    fmt db "%d", 10, 0

global main
extern printf

section .text

main:
    mov rcx, 1000000
    xor rdi, rdi        ; max i
    xor rsi, rsi        ; i

l1:
    dec rcx
    xor r10, r10        ; count
    mov rax, rcx

l2:
    test rax, 1
    jpe even

    mov rbx, 3
    mul rbx
    inc rax
    jmp c1

even:
    mov rbx, 2 …

Run Code Online (Sandbox Code Playgroud)

c++ optimization performance x86 assembly

jef*_*son

2018 08-05

803
推荐指数

8
解决办法

14万
查看次数

为什么GCC在实现整数除法时使用乘以奇数的乘法？

我一直在阅读div和mul组装操作,我决定通过在C中编写一个简单的程序来实现它们:

文件分割

#include <stdlib.h>
#include <stdio.h>

int main()
{
    size_t i = 9;
    size_t j = i / 5;
    printf("%zu\n",j);
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

然后生成汇编语言代码:

gcc -S division.c -O0 -masm=intel

Run Code Online (Sandbox Code Playgroud)

但是看生成的division.s文件,它不包含任何div操作!相反,它通过位移和魔术数字来做某种黑魔法.这是一个计算代码片段i/5:

mov     rax, QWORD PTR [rbp-16]   ; Move i (=9) to RAX
movabs  rdx, -3689348814741910323 ; Move some magic number to RDX (?)
mul     rdx                       ; Multiply 9 by magic number
mov     rax, rdx                  ; Take only the upper 64 bits of the …

Run Code Online (Sandbox Code Playgroud)

c assembly gcc x86-64 integer-division

qiu*_*bit

2016 12-18

206
推荐指数

4
解决办法

1万
查看次数

用C++测量函数的执行时间

我想知道某个函数在我的C++程序中执行多长时间才能在Linux上执行.之后,我想进行速度比较.我看到了几个时间功能,但结果来自于boost.计时:

process_user_cpu_clock, captures user-CPU time spent by the current process

Run Code Online (Sandbox Code Playgroud)

现在,我不清楚我是否使用上述功能,我将获得CPU花在该功能上的唯一时间吗？

其次,我找不到任何使用上述功能的例子.任何人都可以帮我如何使用上述功能？

PS:现在,我std::chrono::system_clock::now()用来在几秒钟内获得时间,但由于每次CPU负载不同,这给了我不同的结果.

c++ optimization profiling

Xar*_*ara

2015 12-17

110
推荐指数

6
解决办法

15万
查看次数

究竟什么是"假设"规则？

正如标题所说,

究竟什么是"假设"规则？

一个典型的答案是:

允许任何和所有代码转换不会改变程序的可观察行为的规则

我们会不时地从某些实现中获取行为,这些行为归因于此规则.好多次错了.那么,这个规则究竟是什么呢.标准没有明确地将此规则作为一个部分或段落提及,那么究竟什么属于这条规则的范围？对我来说,这似乎是一个灰色区域,标准没有详细定义.有人可以根据标准的参考资料详细说明细节.

^{注意:将其标记为C和C++,因为它与两种语言都相关.}

c c++ optimization c++-faq as-if

Alo*_*ave

2019 05-28

83
推荐指数

2
解决办法

7359
查看次数

如何从GCC /铿锵声组件输出中消除"噪音"？

我想检查boost::variant在我的代码中应用的程序集输出,以便查看哪些中间调用被优化掉了.

当我编译以下示例(使用GCC 5.3 g++ -O3 -std=c++14 -S)时,似乎编译器优化了所有内容并直接返回100:

(...)
main:
.LFB9320:
    .cfi_startproc
    movl    $100, %eax
    ret
    .cfi_endproc
(...)

Run Code Online (Sandbox Code Playgroud)

#include <boost/variant.hpp>

struct Foo
{
    int get() { return 100; }
};

struct Bar
{
    int get() { return 999; }
};

using Variant = boost::variant<Foo, Bar>;


int run(Variant v)
{
    return boost::apply_visitor([](auto& x){return x.get();}, v);
}
int main()
{
    Foo f;
    return run(f);
}

Run Code Online (Sandbox Code Playgroud)

但是,完整的程序集输出包含的内容远远超过上面的摘录,对我而言,它看起来永远不会被调用.有没有办法告诉GCC/clang删除所有"噪音"并输出程序运行时实际调用的内容？

完整装配输出:

    .file   "main1.cpp"
    .section    .rodata.str1.8,"aMS",@progbits,1
    .align 8
.LC0:
    .string "/opt/boost/include/boost/variant/detail/forced_return.hpp"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC1: …

Run Code Online (Sandbox Code Playgroud)

c++ assembly gcc clang

m.s*_*.s.

lucky-day

56
推荐指数

3
解决办法

1万
查看次数

为什么循环总是被编译成"do ... while"样式(尾部跳转)？

当试图理解汇编(启用编译器优化)时,我看到这种行为:

这样一个非常基本的循环

outside_loop;
while (condition) {
     statements;
}

Run Code Online (Sandbox Code Playgroud)

经常被编译成(伪代码)

    ; outside_loop
    jmp loop_condition    ; unconditional
loop_start:
    loop_statements
loop_condition:
    condition_check
    jmp_if_true loop_start
    ; outside_loop

Run Code Online (Sandbox Code Playgroud)

但是,如果未打开优化,则会编译为通常可理解的代码:

loop_condition:
    condition_check
    jmp_if_false loop_end
    loop_statements
    jmp loop_condition  ; unconditional
loop_end:

Run Code Online (Sandbox Code Playgroud)

根据我的理解,编译后的代码更像是这样的:

goto condition;
do {
    statements;
    condition:
}
while (condition_check);

Run Code Online (Sandbox Code Playgroud)

我看不到巨大的性能提升或代码可读性提升,为什么经常出现这种情况呢？是否有此循环样式的名称,例如"尾随条件检查"？

optimization performance assembly loops micro-optimization

iBu*_*Bug

2018 04-25

26
推荐指数

1
解决办法

1675
查看次数

如何不优化远离 - 愚蠢功能的机制

我正在寻找一种编程技术,确保用于基准测试的变量(没有可观察到的副作用)不会被编译器优化掉

这提供了一些信息,但我最终使用了愚蠢和以下功能

/**
 * Call doNotOptimizeAway(var) against variables that you use for
 * benchmarking but otherwise are useless. The compiler tends to do a
 * good job at eliminating unused variables, and this function fools
 * it into thinking var is in fact needed.
 */
#ifdef _MSC_VER

#pragma optimize("", off)

template <class T>
void doNotOptimizeAway(T&& datum) {
  datum = datum;
}

#pragma optimize("", on)

#else
template <class T>
void doNotOptimizeAway(T&& datum) {
  asm volatile("" : "+r" (datum)); …

Run Code Online (Sandbox Code Playgroud)

c++ benchmarking assembly c++11 c++14

Lor*_*ins

2017 05-23

19
推荐指数

1
解决办法

1853
查看次数

如何对C++代码的性能进行基准测试？

我开始认真研究算法和数据结构,并有兴趣学习如何比较我可以实现A&DT的不同方式的性能.

对于简单的测试,我可以获得运行之前/之后的时间,运行该事物10 ^ 5次,并平均运行时间.我可以按大小参数化输入,或者对随机输入进行采样,并获得运行时间与输入大小的列表.我可以将其输出为csv文件,并将其输入到pandas中.

我不确定是否有任何警告.我也不确定如何测量空间复杂度.

我正在学习用C++编程.有没有人性化的工具来实现我的目标？

c++ algorithm benchmarking data-structures

alp*_*pha

2018 03-01

9
推荐指数

2
解决办法

5007
查看次数

C循环优化有助于最终分配

因此,对于我在计算机系统课程中的最终作业,我们需要优化这些forloops,使其比原始版本更快.使用我们的linux服务器,基本等级不到7秒,完整等级不到5秒.我在这里的代码大约需要5.6秒.我想我可能需要以某种方式使用指针来使它更快,但我不是很确定.任何人都可以提供我的任何提示或选项吗？非常感谢!

QUICKEDIT:文件必须保持50行或更少,我忽略了教师所包含的那些注释行.

#include <stdio.h>
#include <stdlib.h>

// You are only allowed to make changes to this code as specified by the comments in it.

// The code you submit must have these two values.
#define N_TIMES     600000
#define ARRAY_SIZE   10000

int main(void)
{
    double  *array = calloc(ARRAY_SIZE, sizeof(double));
    double  sum = 0;
    int     i;

    // You can add variables between this comment ...
    register double sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0, sum5 = 0, …

Run Code Online (Sandbox Code Playgroud)

c optimization loops compiler-optimization debug-mode

Bla*_*147

2019 08-02

8
推荐指数

2
解决办法

5650
查看次数

您将如何对功能的性能进行基准测试

这可能是一个更高级的问题.如果你有两个返回值的函数

int F(int input1, int input2)
{
    int output;
    //some algorithm that assigns value to output//
    return output;
}

int D(int input1, int input2)
{
    int output;
    //another algorithm that assigns value to output//
    return output;
}

Run Code Online (Sandbox Code Playgroud)

条件是F(a,b)== D(a,b)(对于相同的输入都返回相同的值).

如果你想对他们的表现进行基准测试,你会怎么做？更准确地说,你会如何孤立它需要执行时间F(A,B)或d(A,B)等,它并不能反映它花费的时间等在基准设置二次操作？

c++ benchmarking function

Adl*_*l A

lucky-day

5
推荐指数

1
解决办法

2328
查看次数

如何在x86中仅使用2个连续的leal指令将寄存器乘以37？

假设%edi包含x并且我想仅使用2个连续的leal指令结束37*x,我将如何进行此操作？

例如,你可以做到45倍

leal (%edi, %edi, 8), %edi   
leal (%edi, %edi, 4), %eax (to be returned)

Run Code Online (Sandbox Code Playgroud)

我不能为我的生活找出代替8和4的数字,以便结果(%eax)将是37x

x86 assembly x86-64 multiplication strength-reduction

New*_*e18

2017 10-06

5
推荐指数

1
解决办法

1302
查看次数

添加冗余分配可在编译时加速代码而无需优化

我发现了一个有趣的现象:

#include<stdio.h>
#include<time.h>

int main() {
    int p, q;
    clock_t s,e;
    s=clock();
    for(int i = 1; i < 1000; i++){
        for(int j = 1; j < 1000; j++){
            for(int k = 1; k < 1000; k++){
                p = i + j * k;
                q = p;  //Removing this line can increase running time.
            }
        }
    }
    e = clock();
    double t = (double)(e - s) / CLOCKS_PER_SEC;
    printf("%lf\n", t);
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

我在i5-5257U Mac OS上使用GCC 7.3.0来编译代码 …

performance x86 assembly

hel*_*qiu

2018 03-10

3
推荐指数

1
解决办法

627
查看次数

绩效评估的惯用方法？

我正在评估我的项目的网络+渲染工作负载。

程序连续运行一个主循环：

while (true) {
   doSomething()
   drawSomething()
   doSomething2()
   sendSomething()
}

Run Code Online (Sandbox Code Playgroud)

主循环每秒运行 60 多次。

我想查看性能故障，每个程序需要多少时间。

我担心的是，如果我打印每个程序的每个入口和出口的时间间隔，

这会导致巨大的性能开销。

我很好奇什么是衡量性能的惯用方法。

日志打印是否足够好？

benchmarking microbenchmark

shp*_*ark

lucky-day

1
推荐指数

1
解决办法

1322
查看次数

标签统计

assembly ×7

c++ ×7

optimization ×5

benchmarking ×4

c ×3

performance ×3

x86 ×3

gcc ×2

loops ×2

x86-64 ×2

algorithm ×1

as-if ×1

c++-faq ×1

c++11 ×1

c++14 ×1

clang ×1

compiler-optimization ×1

data-structures ×1

debug-mode ×1

function ×1

integer-division ×1

micro-optimization ×1

microbenchmark ×1

multiplication ×1

profiling ×1

strength-reduction ×1

文件分割

标签 统计

标签统计