相关疑难解决方法(0)

为什么要快速运行glibc的问题太复杂了？

我在这里浏览strlen代码，想知道是否真的需要代码中使用的优化？例如，为什么下面这样的东西不能同样好或更好？

unsigned long strlen(char s[]) {
    unsigned long i;
    for (i = 0; s[i] != '\0'; i++)
        continue;
    return i;
}

Run Code Online (Sandbox Code Playgroud)

较简单的代码对编译器进行优化是否更好或更容易？

strlen链接后面页面上的代码如下所示：

/* Copyright (C) 1991, 1993, 1997, 2000, 2003 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Written by Torbjorn Granlund (tege@sics.se),
   with help from Dan Sahlin (dan@sics.se);
   commentary by Jim Blandy (jimb@ai.mit.edu).

   The GNU C Library is free software; you can redistribute it and/or
   modify it under …

Run Code Online (Sandbox Code Playgroud)

c optimization portability glibc strlen

作者

2019 08-29

283
推荐指数

7
解决办法

5万
查看次数

启用优化后，为什么此代码慢6.5倍？

我想基准glibc的strlen功能，出于某种原因，发现它显然执行多慢与GCC启用优化，我不知道为什么。

这是我的代码：

#include <time.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

int main() {
    char *s = calloc(1 << 20, 1);
    memset(s, 65, 1000000);
    clock_t start = clock();
    for (int i = 0; i < 128; ++i) {
        s[strlen(s)] = 'A';
    }
    clock_t end = clock();
    printf("%lld\n", (long long)(end - start));
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

在我的机器上，它输出：

$ gcc test.c && ./a.out
13336
$ gcc -O1 test.c && ./a.out
199004
$ gcc -O2 test.c && ./a.out
83415 …

Run Code Online (Sandbox Code Playgroud)

c performance gcc glibc

Tsa*_*arN

2019 10-24

64
推荐指数

2
解决办法

3997
查看次数

在x86和x64上读取同一页面内的缓冲区末尾是否安全？

如果允许在输入缓冲区末尾读取少量数据,则可以(并且)简化在高性能算法中找到的许多方法.这里,"少量"通常意味着W - 1超过结束的字节,其中W是算法的字节大小(例如,对于处理64位块中的输入的算法,最多7个字节).

很明显,写入输入缓冲区的末尾通常是不安全的,因为您可能会破坏缓冲区¹之外的数据.同样清楚的是,在缓冲区的末尾读取到另一页面可能会触发分段错误/访问冲突,因为下一页可能不可读.

但是,在读取对齐值的特殊情况下,页面错误似乎是不可能的,至少在x86上是这样.在该平台上,页面(以及因此内存保护标志)具有4K粒度(较大的页面,例如2MiB或1GiB,可能,但这些是4K的倍数),因此对齐的读取将仅访问与有效页面相同的页面中的字节缓冲区的一部分.

这是一个循环的规范示例,它对齐其输入并在缓冲区末尾读取最多7个字节:

int processBytes(uint8_t *input, size_t size) {

    uint64_t *input64 = (uint64_t *)input, end64 = (uint64_t *)(input + size);
    int res;

    if (size < 8) {
        // special case for short inputs that we aren't concerned with here
        return shortMethod();
    }

    // check the first 8 bytes
    if ((res = match(*input)) >= 0) {
        return input + res;
    }

    // align pointer to the next 8-byte …

Run Code Online (Sandbox Code Playgroud)

c optimization performance x86 assembly

Bee*_*ope

2017 05-23

33
推荐指数

2
解决办法

2027
查看次数

gcc优化标志-O3使代码比-O2慢

我发现这个主题为什么处理排序数组比未排序数组更快？.并尝试运行此代码.而且我发现了奇怪的行为.如果我使用-O3优化标志编译此代码,则需要2.98605 sec运行.如果我用-O2它编译1.98093 sec.我尝试在同一环境中的同一台机器上运行此代码几次(5或6),我关闭所有其他软件(chrome,skype等).

gcc --version
gcc (Ubuntu 4.9.2-0ubuntu1~14.04) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Run Code Online (Sandbox Code Playgroud)

那么请你能解释一下为什么会这样吗？我阅读gcc手册,我看到-O3包括-O2.谢谢你的帮助.

PS添加代码

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // Generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned …

Run Code Online (Sandbox Code Playgroud)

c++ optimization gcc

Mik*_*aev

2018 06-19

18
推荐指数

1
解决办法

5180
查看次数

由于从越界内存跳过cmov,很难调试SEGV

我正在尝试将一些高性能的汇编函数编写为练习,并且遇到了在运行程序时发生的奇怪的段错误,但在valgrind或nemiver中却没有.

基本上一个不应该运行的cmov,带有一个越界的地址,即使条件总是假的,也会让我发生段错误

我有一个快速和慢速的版本.缓慢的一直在工作.快速的一个工作,除非它收到一个非ascii字符,此时它崩溃可怕,除非我在adb或nemiver上运行.

ascii_flags只是一个128字节的数组(最后有一点空间),包含所有ASCII字符(alpha,numeric,printable等)的标志

这工作:

ft_isprint:
    xor EAX, EAX                ; empty EAX
    test EDI, ~127              ; check for non-ascii (>127) input
    jnz .error
    mov EAX, [rel ascii_flags + EDI]    ; load ascii table if input fits
    and EAX, 0b00001000         ; get specific bit
.error:
    ret

Run Code Online (Sandbox Code Playgroud)

但这不是:

ft_isprint:
    xor EAX, EAX                ; empty EAX
    test EDI, ~127              ; check for non-ascii (>127) input
    cmovz EAX, [rel ascii_flags + EDI]  ; load ascii table if input fits
    and EAX, flag_print         ; get …

Run Code Online (Sandbox Code Playgroud)

assembly gdb x86-64

Lou*_*ski

2019 01-07

5
推荐指数

2
解决办法

148
查看次数

在 C++ 中，分支预测器是否预测隐式条件语句？

在这段代码中，它被写成result += runs[i] > runs[i-1];一个隐式条件语句。在 C++ 中，分支预测器是否对该语句进行预测？或者我是否必须明确使用if关键字来进行分支预测？

using namespace std; 
int progressDays(vector<int> runs) {
    if (runs.size() < 2) {return 0;}
    int result = 0;
    for (int i = 1; i < runs.size(); i++) {result += runs[i] > runs[i-1];}
    return result;
}

Run Code Online (Sandbox Code Playgroud)

c++ syntax conditional-statements branch-prediction

M. *_*ves

2020 07-29

1
推荐指数

1
解决办法

241
查看次数