STL find比手工制作的循环更好

Question

STL find比手工制作的循环更好

dus*_*sha 14 c++ performance assembly stl find

我有一些问题.给出以下C++代码片段:

#include <boost/progress.hpp>

#include <vector>
#include <algorithm>
#include <numeric>
#include <iostream>

struct incrementor
{
  incrementor() : curr_() {}

  unsigned int operator()()
  { return curr_++; }

private:
  unsigned int curr_;
};

template<class Vec>
char const* value_found(Vec const& v, typename Vec::const_iterator i)
{
  return i==v.end() ? "no" : "yes";
}


template<class Vec>
typename Vec::const_iterator find1(Vec const& v, typename Vec::value_type val)
{
  return find(v.begin(), v.end(), val);
}


template<class Vec>
typename Vec::const_iterator find2(Vec const& v, typename Vec::value_type val)
{
  for(typename Vec::const_iterator i=v.begin(), end=v.end(); i<end; ++i)
    if(*i==val) return i;
  return v.end();
}

int main()
{
  using namespace std;
  typedef vector<unsigned int>::const_iterator iter;
  vector<unsigned int> vec;
  vec.reserve(10000000);

  boost::progress_timer pt;

  generate_n(back_inserter(vec), vec.capacity(), incrementor());
  //added this line, to avoid any doubts, that compiler is able to
  // guess the data is sorted
  random_shuffle(vec.begin(), vec.end());

  cout << "value generation required: " << pt.elapsed() << endl;

  double d;
  pt.restart();
  iter found=find1(vec, vec.capacity());
  d=pt.elapsed();
  cout << "first search required: " << d << endl;
  cout << "first search found value: " << value_found(vec, found)<< endl;


  pt.restart();
  found=find2(vec, vec.capacity());
  d=pt.elapsed();
  cout << "second search required: " << d << endl;
  cout << "second search found value: " << value_found(vec, found)<< endl;


  return 0;
}

Run Code Online (Sandbox Code Playgroud)

在我的机器上(Intel i7,Windows Vista),STL查找(通过find1调用)比手工制作的循环(通过find2调用)快10倍.我首先想到Visual C++执行某种矢量化(可能我在这里弄错了),但据我所知,程序集看起来不像它使用矢量化.为什么STL循环更快？手工制作的循环与STL-find主体的循环相同.

我被要求发布程序的输出.没有洗牌:

value generation required: 0.078
first search required: 0.008
first search found value: no
second search required: 0.098
second search found value: no

Run Code Online (Sandbox Code Playgroud)

随着shuffle(缓存效果):

value generation required: 1.454
first search required: 0.009
first search found value: no
second search required: 0.044
second search found value: no

Run Code Online (Sandbox Code Playgroud)

非常感谢,

dusha.

PS我返回迭代器并写出结果(找到与否),因为我想阻止编译器优化,它认为根本不需要循环.搜索到的值显然不在向量中.

PPS我被要求发布为查找功能生成的程序集.这里是:

found=find1(vec, vec.capacity());
001811D0  lea         eax,[esp+5Ch] 
001811D4  call        std::vector<unsigned int,std::allocator<unsigned int> >::capacity (1814D0h) 
001811D9  mov         esi,dword ptr [esp+60h] 
001811DD  mov         ecx,dword ptr [esp+64h] 
001811E1  cmp         esi,ecx 
001811E3  je          wmain+180h (1811F0h) 
001811E5  cmp         dword ptr [esi],eax 
001811E7  je          wmain+180h (1811F0h) 
001811E9  add         esi,4 
001811EC  cmp         esi,ecx 
001811EE  jne         wmain+175h (1811E5h) 



found=find2(vec, vec.capacity());
001812AE  lea         eax,[esp+5Ch] 
001812B2  call        std::vector<unsigned int,std::allocator<unsigned int> >::capacity (1814D0h) 
001812B7  mov         ecx,dword ptr [esp+60h] 
001812BB  mov         edx,dword ptr [esp+64h] 
001812BF  cmp         ecx,edx 
001812C1  je          wmain+262h (1812D2h) 
001812C3  cmp         dword ptr [ecx],eax 
001812C5  je          wmain+34Fh (1813BFh) 
001812CB  add         ecx,4 
001812CE  cmp         ecx,edx 
001812D0  jne         wmain+253h (1812C3h)

Run Code Online (Sandbox Code Playgroud)

find2使用ecx-register而不是esi.这两个寄存器有什么区别？是否可以认为esi假定指针正确对齐,从而带来额外的性能？

读取一些汇编参考ecx只是一个计数器,而esi是内存源.所以我认为STL算法知道Random Access Iterator正确对齐,因此使用内存指针.在非STL版本中,没有猜测对齐方式.我对吗？