优化位阵列访问

erj*_*ang 2 c++ optimization stl vector bitarray

我正在使用Dipperstein的bitarray.cpp类来处理双层(黑白)图像,其中图像数据本身就像一位像素一样存储.

我需要使用for循环遍历每个位,每个图像大约4-9百万像素,数百个图像,类似于:

for( int i = 0; i < imgLength; i++) {
    if( myBitArray[i] == 1 ) {
         //  ... do stuff ...
    }
}
Run Code Online (Sandbox Code Playgroud)

性能可用,但并不令人惊讶.我通过gprof运行程序,发现有很多时间和数百万次调用std::vector迭代器和开始等方法.这是顶部采样函数:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 37.91      0.80     0.80        2     0.40     1.01  findPattern(bit_array_c*, bool*, int, int, int)
 12.32      1.06     0.26 98375762     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::__normal_iterator(unsigned char const* const&)
 11.85      1.31     0.25 48183659     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator+(int const&) const
 11.37      1.55     0.24 49187881     0.00     0.00  std::vector<unsigned char, std::allocator<unsigned char> >::begin() const
  9.24      1.75     0.20 48183659     0.00     0.00  bit_array_c::operator[](unsigned int) const
  8.06      1.92     0.17 48183659     0.00     0.00  std::vector<unsigned char, std::allocator<unsigned char> >::operator[](unsigned int) const
  5.21      2.02     0.11 48183659     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator*() const
  0.95      2.04     0.02                             bit_array_c::operator()(unsigned int)
  0.47      2.06     0.01  6025316     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >::__normal_iterator(unsigned char* const&)
  0.47      2.06     0.01  3012657     0.00     0.00  __gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator*() const
  0.47      2.08     0.01  1004222     0.00     0.00  std::vector<unsigned char, std::allocator<unsigned char> >::end() const
... remainder omitted ...
Run Code Online (Sandbox Code Playgroud)

我对C++的STL并不是很熟悉,但任何人都可以解释为什么,例如,std :: vector :: begin()被调用了几百万次?当然,我是否可以做些什么来加快速度呢?

编辑:我只是放弃并优化了搜索功能(循环).

Chr*_*odd 6

您在配置文件输出中看到许多内联函数的事实意味着它们没有内联 - 也就是说,您没有打开优化进行编译.因此,优化代码最简单的方法是使用-O2或-O3.

分析未经优化的代码很少值得,因为优化和未优化代码的执行配置文件可能会完全不同.33