为什么'=='在std :: string上运行缓慢？

Question

为什么'=='在std :: string上运行缓慢？

在分析我的应用程序时,我意识到在字符串比较上花了很多时间.所以我写了一个简单的基准测试,我很惊讶'=='比string :: compare和strcmp慢得多!这是代码,谁能解释为什么呢？或者我的代码有什么问题？因为根据标准'=='只是一个运算符重载而只是返回!lhs.compare(rhs).

#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
#include "Timer.h"
#include <random>
#include <time.h>
#include <string.h>
using namespace std;
uint64_t itr  = 10000000000;//10 Billion
int len = 100;
int main() {
  srand(time(0));
  string s1(len,random()%128);
  string s2(len,random()%128);

uint64_t a = 0;
  Timer t;
  t.begin();
  for(uint64_t i =0;i<itr;i++){
    if(s1 == s2)
      a = i;
  }
  t.end();

  cout<<"==       took:"<<t.elapsedMillis()<<endl;

  t.begin();
  for(uint64_t i =0;i<itr;i++){
    if(s1.compare(s2)==0)
      a = i;
  }
  t.end();

  cout<<".compare took:"<<t.elapsedMillis()<<endl;

  t.begin();
  for(uint64_t i =0;i<itr;i++){
    if(strcmp(s1.c_str(),s2.c_str()))
      a = i;
  }
  t.end();

  cout<<"strcmp   took:"<<t.elapsedMillis()<<endl;

  return a;
}

Run Code Online (Sandbox Code Playgroud)

这是结果:

==       took:5986.74
.compare took:0.000349
strcmp   took:0.000778

Run Code Online (Sandbox Code Playgroud)

我的编译标志:

CXXFLAGS = -O3 -Wall -fmessage-length = 0 -std = c ++ 1y

我在x86_64 linux机器上使用gcc 4.9.

显然使用-o3进行了一些优化,我猜测完全推出了最后两个循环; 但是,使用-o2仍然是结果很奇怪:

10亿次迭代:

==       took:19591
.compare took:8318.01
strcmp   took:6480.35

Run Code Online (Sandbox Code Playgroud)

PS Timer只是一个测量花费时间的包装类; 我完全相信:D

Timer类的代码:

#include <chrono>

#ifndef SRC_TIMER_H_
#define SRC_TIMER_H_


class Timer {
  std::chrono::steady_clock::time_point start;
  std::chrono::steady_clock::time_point stop;
public:
  Timer(){
    start = std::chrono::steady_clock::now();
    stop = std::chrono::steady_clock::now();
  }
  virtual ~Timer() {}

  inline void begin() {
    start = std::chrono::steady_clock::now();
  }

  inline void end() {
    stop = std::chrono::steady_clock::now();
  }

  inline double elapsedMillis() {
    auto diff = stop - start;
    return  std::chrono::duration<double, std::milli> (diff).count();
  }

  inline double elapsedMicro() {
    auto diff = stop - start;
    return  std::chrono::duration<double, std::micro> (diff).count();
  }

  inline double elapsedNano() {
    auto diff = stop - start;
    return  std::chrono::duration<double, std::nano> (diff).count();
  }

  inline double elapsedSec() {
    auto diff = stop - start;
    return std::chrono::duration<double> (diff).count();
  }
};

#endif /* SRC_TIMER_H_ */

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ton*_*roy 13

更新:http ://ideone.com/rGc36a上改进基准的输出

==       took:21
.compare took:21
strcmp   took:14
==       took:21
.compare took:25
strcmp   took:14

Run Code Online (Sandbox Code Playgroud)

事实证明,使其有意义地工作至关重要的是"智取"编译器预测在编译时比较的字符串的能力:

// more strings that might be used...
string s[] = { {len,argc+'A'}, {len,argc+'A'}, {len, argc+'B'}, {len, argc+'B'} };

if(s[i&3].compare(s[(i+1)&3])==0)  // trickier to optimise
  a += i;  // cumulative observable side effects

Run Code Online (Sandbox Code Playgroud)

请注意,一般情况下,strcmp在功能上不等于==或.compare当文本可能嵌入NUL时,因为前者将"提前退出".(这不是它上面"更快"的原因,但请阅读下面的评论,以及字符串长度/内容等可能的变化.)

讨论/早期答案

只需看看您的实施 - 例如

echo '#include <string>' > stringE.cc
g++ -E stringE.cc | less

Run Code Online (Sandbox Code Playgroud)

搜索basic_string模板,然后为运算符==处理两个字符串实例 - 我的是:

template<class _Elem,
    class _Traits,
    class _Alloc> inline
    bool __cdecl operator==(
            const basic_string<_Elem, _Traits, _Alloc>& _Left,
            const basic_string<_Elem, _Traits, _Alloc>& _Right)
    {
    return (_Left.compare(_Right) == 0);
    }

Run Code Online (Sandbox Code Playgroud)

请注意,这operator==是内联和简单的调用compare.在启用正常优化级别的情况下,它无法始终显着降低,尽管由于周围代码的微妙副作用,优化器可能偶尔会优于一个循环优于另一个循环.

您的表面问题可能是由于您的代码被优化超出了预期工作的范围,for任意展开到不同程度的循环,或者优化或时间中的其他怪癖或错误.当您拥有不具有任何累积副作用的不变输入和循环时,这并不罕见(即编译器可以计算出a未使用的中间值,因此只有最后一个a = i需要生效).

所以,学会写更好的基准.在这种情况下,这有点棘手,因为在内存中有许多不同的字符串准备好调用比较,并以优化器无法在编译时预测的方式选择它们仍然足够快,不会压倒和模糊影响字符串比较代码,不是一件容易的事.此外,超越一点 - 比较分布在更多内存中的事物会使缓存影响与基准测试更相关,这进一步模糊了真正的比较性能.

不过,如果我是你,我会从文件中读取一些字符串 - 将每个字符串推送到a vector,然后循环vector执行相邻元素之间的三个比较操作中的每一个.然后编译器不可能预测结果中的任何模式.您可能会发现compare/ ==更快/更慢,而不是strcmp通常在第一个字符或三个字符串不同的字符串,但另一种方式是相同或仅在结尾附近不同的长字符串,所以在结束之前请确保尝试不同类型的输入了解性能概况.

Answer 2

pax*_*blo 12

要么你的计时很棘手,要么你的编译器已经优化了你的一些代码.

考虑一下,在0.000349毫秒内进行100亿次操作(我将使用0.000500毫秒,或半微秒,以使我的计算更容易)意味着您每秒执行20 万亿次操作.

即使一个操作可以在一个时钟周期内完成,也就是20,000 GHz,有点超出当前的CPU数量,即使是大量优化的流水线和多个内核.

并且,鉴于-O2优化的数字彼此相当(==大约是时间的两倍compare),"代码优化不存在"的可能性看起来更有可能.

由于operator==需要打电话compare来完成其工作,所以时间加倍可以很容易地解释为100亿额外的函数调用.

作为进一步的支持,请检查下表,以毫秒为单位显示数字(第三列是第二列的简单除以十的比例,以便第一列和第三列都进行十亿次迭代):

         -O2/1billion  -O3/10billion  -O3/1billion  Improvement
               (a)            (b)     (c = b / 10)    (a / c)
         ============  =============  ============  ===========
oper==          19151           5987           599           32
compare          8319         0.0005       0.00005  166,380,000

Run Code Online (Sandbox Code Playgroud)

它乞丐认为-O3可以将==代码加速大约32倍,但设法将compare代码加速几亿倍.

我强烈建议您查看编译器生成的汇编程序代码(例如使用该gcc -S选项),以验证它实际上正在执行它声称要执行的工作.

@Behrooz:结果是轶事,代码是证明. (5认同)

Answer 3

tha*_*ang 8

问题是编译器正在对您的代码进行大量的严格优化.

这是修改后的代码:

#include <iostream>
#include <vector>
#include <string>
#include <stdint.h>
#include "Timer.h"
#include <random>
#include <time.h>
#include <string.h>
using namespace std;
uint64_t itr  = 500000000;//10 Billion
int len = 100;
int main() {
  srand(time(0));
  string s1(len,random()%128);
  string s2(len,random()%128);

uint64_t a = 0;
  Timer t;
  t.begin();
  for(uint64_t i =0;i<itr;i++){
asm volatile("" : "+g"(s2));
    if(s1 == s2)
      a += i;
  }
  t.end();

  cout<<"==       took:"<<t.elapsedMillis()<<",a="<<a<<endl;

  t.begin();
  for(uint64_t i =0;i<itr;i++){
asm volatile("" : "+g"(s2));
    if(s1.compare(s2)==0)
      a+=i;
  }
  t.end();

  cout<<".compare took:"<<t.elapsedMillis()<<",a="<<a<<endl;

  t.begin();
  for(uint64_t i =0;i<itr;i++){
asm volatile("" : "+g"(s2));
    if(strcmp(s1.c_str(),s2.c_str()) == 0)
      a+=i;
  }
  t.end();

  cout<<"strcmp   took:"<<t.elapsedMillis()<<",a="<<a<< endl;

  return a;
}

Run Code Online (Sandbox Code Playgroud)

我添加了asm volatile("":"+ g"(s2)); 强制编译器运行比较.我还添加了<<",a ="<强制编译器计算a.

输出现在是:

==       took:10221.5,a=0
.compare took:10739,a=0
strcmp   took:9700,a=0

Run Code Online (Sandbox Code Playgroud)

你能解释一下为什么strcmp比.compare慢得多于= =？然而,速度差异很小,但很重要.

它实际上是有道理的!:p

Answer 4

Ian*_*Ian 8

下面的速度分析是错误的 - 感谢Tony D指出我的错误.尽管如此,对更好的基准的批评和建议仍然适用.

以前的所有答案都涉及基准测试中的编译器优化问题,但不回答为什么strcmp仍然稍微快一些.

strcmp由于字符串有时包含零,因此可能更快(在更正的基准测试中).由于strcmp使用C字符串,它可以在遇到字符串终止字符时退出'\0'.std::string::compare()将其'\0'视为另一个char并继续直到字符串数组的结尾.

由于您已经非确定地播种了RNG,并且只生成了两个字符串,因此每次运行代码时您的结果都会发生变化.(我会在基准测试中反对这一点.)鉴于数字,128次中有28次,应该没有优势.128个中的10个,你将获得超过10倍的速度.等等.

除了击败编译器的优化器之外,我建议下次为每个比较迭代生成一个新字符串,以便平均掉这些效果.

归档时间：	11 年前
查看次数：	3608 次
最近记录：	11 年前