快速计算C程序中执行的指令数

Question

快速计算C程序中执行的指令数

Jea*_*ean 7 c linux profile

有没有一种简单的方法可以在执行C程序时快速计算执行的指令数量(x86指令 - 每个指令的数量和数量)？

我gcc version 4.7.1 (GCC)在x86_64 GNU/Linux机器上使用.

Answer 1

Cir*_*四事件 5

Linuxperf_event_open系统调用config = PERF_COUNT_HW_INSTRUCTIONS

这个 Linux 系统调用似乎是性能事件的跨架构包装器，包括来自 CPU 的硬件性能计数器和来自内核的软件事件。

这是改编自该man perf_event_open页面的示例：

perf_event_open.c

#define _GNU_SOURCE
#include <asm/unistd.h>
#include <linux/perf_event.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>

#include <inttypes.h>
#include <sys/types.h>

static long
perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
                int cpu, int group_fd, unsigned long flags)
{
    int ret;

    ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
                    group_fd, flags);
    return ret;
}

int
main(int argc, char **argv)
{
    struct perf_event_attr pe;
    long long count;
    int fd;

    uint64_t n;
    if (argc > 1) {
        n = strtoll(argv[1], NULL, 0);
    } else {
        n = 10000;
    }

    memset(&pe, 0, sizeof(struct perf_event_attr));
    pe.type = PERF_TYPE_HARDWARE;
    pe.size = sizeof(struct perf_event_attr);
    pe.config = PERF_COUNT_HW_INSTRUCTIONS;
    pe.disabled = 1;
    pe.exclude_kernel = 1;
    // Don't count hypervisor events.
    pe.exclude_hv = 1;

    fd = perf_event_open(&pe, 0, -1, -1, 0);
    if (fd == -1) {
        fprintf(stderr, "Error opening leader %llx\n", pe.config);
        exit(EXIT_FAILURE);
    }

    ioctl(fd, PERF_EVENT_IOC_RESET, 0);
    ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

    /* Loop n times, should be good enough for -O0. */
    __asm__ (
        "1:;\n"
        "sub $1, %[n];\n"
        "jne 1b;\n"
        : [n] "+r" (n)
        :
        :
    );

    ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
    read(fd, &count, sizeof(long long));

    printf("Used %lld instructions\n", count);

    close(fd);
}

Run Code Online (Sandbox Code Playgroud)

编译并运行：

g++ -ggdb3 -O0 -std=c++11 -Wall -Wextra -pedantic -o perf_event_open.out perf_event_open.c
./perf_event_open.out

Run Code Online (Sandbox Code Playgroud)

输出：

Used 20016 instructions

Run Code Online (Sandbox Code Playgroud)

所以我们看到结果非常接近预期值 20000：10k *__asm__块 ( sub, jne) 中每个循环两条指令。

如果我改变论点，即使是低值，例如100：

./perf_event_open.out 100

Run Code Online (Sandbox Code Playgroud)

它给：

Used 216 instructions

Run Code Online (Sandbox Code Playgroud)

保持这个常量 + 16 条指令，所以看起来准确度相当高，那 16 条必须只是ioctl我们小循环之后的设置指令。

现在您可能还对以下内容感兴趣：

防止重新排序系统调用：在 C++ 中强制执行语句顺序
防止测试循环被优化：如何防止 GCC 优化一个繁忙的等待循环？

可以通过此系统调用测量的其他感兴趣的事件：

周期计数：如何从 C++ 获取 x86_64 中的 CPU 周期计数？

在 Ubuntu 20.04 amd64、GCC 9.3.0、Linux 内核 5.4.0、Intel Core i7-7820HQ CPU 上测试。

Answer 2

小智 1

可能是这个问题的重复

我说可能是因为您要求汇编指令，但该问题处理代码的 C 级分析。

然而，我向您提出的问题是：为什么您想要分析实际执行的机器指令？作为第一个问题，不同的编译器及其优化设置之间会有所不同。作为一个更实际的问题，您实际上可以用这些信息做什么？如果您正在寻找/优化瓶颈，那么代码分析器就是您所需要的。

不过，我可能会错过一些重要的事情。

@mpen：不一定，例如，如果您有一个算法使用大型查找表，而另一个算法使用更多计算方法执行相同的操作，那么第一个可能有更多的加载指令，每个指令都可能会停滞>由于高速缓存未命中而导致 100 个周期。类似地，您可能有一种算法使用大量昂贵的指令，例如“FSQRT”，而另一种算法避免如此昂贵的指令，并且可能使用更多的加法/乘法 - 第二种算法可能会更快，即使它执行更多的指令。 (4认同)
*执行的 CPU 指令数将是比较算法的一种简单方法，无需担心出现问题或与其他程序竞争资源，尽管仍然依赖于指令集，但与处理能力无关。 (2认同)

归档时间：	13 年，2 月前
查看次数：	8124 次
最近记录：	6 年，7 月前