相关疑难解决方法(0)

为什么要快速运行glibc的问题太复杂了？

我在这里浏览strlen代码，想知道是否真的需要代码中使用的优化？例如，为什么下面这样的东西不能同样好或更好？

unsigned long strlen(char s[]) {
    unsigned long i;
    for (i = 0; s[i] != '\0'; i++)
        continue;
    return i;
}

Run Code Online (Sandbox Code Playgroud)

较简单的代码对编译器进行优化是否更好或更容易？

strlen链接后面页面上的代码如下所示：

/* Copyright (C) 1991, 1993, 1997, 2000, 2003 Free Software Foundation, Inc.
   This file is part of the GNU C Library.
   Written by Torbjorn Granlund (tege@sics.se),
   with help from Dan Sahlin (dan@sics.se);
   commentary by Jim Blandy (jimb@ai.mit.edu).

   The GNU C Library is free software; you can redistribute it and/or
   modify it under …

Run Code Online (Sandbox Code Playgroud)

c optimization portability glibc strlen

作者

2019 08-29

283
推荐指数

7
解决办法

5万
查看次数

启用优化后，为什么此代码慢6.5倍？

我想基准glibc的strlen功能，出于某种原因，发现它显然执行多慢与GCC启用优化，我不知道为什么。

这是我的代码：

#include <time.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

int main() {
    char *s = calloc(1 << 20, 1);
    memset(s, 65, 1000000);
    clock_t start = clock();
    for (int i = 0; i < 128; ++i) {
        s[strlen(s)] = 'A';
    }
    clock_t end = clock();
    printf("%lld\n", (long long)(end - start));
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

在我的机器上，它输出：

$ gcc test.c && ./a.out
13336
$ gcc -O1 test.c && ./a.out
199004
$ gcc -O2 test.c && ./a.out
83415 …

Run Code Online (Sandbox Code Playgroud)

c performance gcc glibc

Tsa*_*arN

2019 10-24

64
推荐指数

2
解决办法

3997
查看次数

在x86和x64上读取同一页面内的缓冲区末尾是否安全？

如果允许在输入缓冲区末尾读取少量数据,则可以(并且)简化在高性能算法中找到的许多方法.这里,"少量"通常意味着W - 1超过结束的字节,其中W是算法的字节大小(例如,对于处理64位块中的输入的算法,最多7个字节).

很明显,写入输入缓冲区的末尾通常是不安全的,因为您可能会破坏缓冲区¹之外的数据.同样清楚的是,在缓冲区的末尾读取到另一页面可能会触发分段错误/访问冲突,因为下一页可能不可读.

但是,在读取对齐值的特殊情况下,页面错误似乎是不可能的,至少在x86上是这样.在该平台上,页面(以及因此内存保护标志)具有4K粒度(较大的页面,例如2MiB或1GiB,可能,但这些是4K的倍数),因此对齐的读取将仅访问与有效页面相同的页面中的字节缓冲区的一部分.

这是一个循环的规范示例,它对齐其输入并在缓冲区末尾读取最多7个字节:

int processBytes(uint8_t *input, size_t size) {

    uint64_t *input64 = (uint64_t *)input, end64 = (uint64_t *)(input + size);
    int res;

    if (size < 8) {
        // special case for short inputs that we aren't concerned with here
        return shortMethod();
    }

    // check the first 8 bytes
    if ((res = match(*input)) >= 0) {
        return input + res;
    }

    // align pointer to the next 8-byte …

Run Code Online (Sandbox Code Playgroud)

c optimization performance x86 assembly

Bee*_*ope

2017 05-23

33
推荐指数

2
解决办法

2027
查看次数

为什么strlen()的实现有效？

_{(声明:我已经看到了这个问题,我不是重新问了-我很感兴趣,为什么代码工作,而不是在如何它的工作原理.)}

所以,这里的这个实现苹果的(当然,FreeBSD的)strlen().它使用一个众所周知的优化技巧,即它一次检查4或8个字节,而不是与0进行逐字节比较:

size_t strlen(const char *str)
{
    const char *p;
    const unsigned long *lp;

    /* Skip the first few bytes until we have an aligned p */
    for (p = str; (uintptr_t)p & LONGPTR_MASK; p++)
        if (*p == '\0')
            return (p - str);

    /* Scan the rest of the string using word sized operation */
    for (lp = (const unsigned long *)p; ; lp++)
        if ((*lp - mask01) …

Run Code Online (Sandbox Code Playgroud)

c undefined-behavior

作者

2017 05-23

20
推荐指数

1
解决办法

1193
查看次数