erj*_*ang 2 c++ optimization stl vector bitarray
我正在使用Dipperstein的bitarray.cpp类来处理双层(黑白)图像,其中图像数据本身就像一位像素一样存储.
我需要使用for循环遍历每个位,每个图像大约4-9百万像素,数百个图像,类似于:
for( int i = 0; i < imgLength; i++) {
if( myBitArray[i] == 1 ) {
// ... do stuff ...
}
}
Run Code Online (Sandbox Code Playgroud)
性能可用,但并不令人惊讶.我通过gprof运行程序,发现有很多时间和数百万次调用std::vector迭代器和开始等方法.这是顶部采样函数:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
37.91 0.80 0.80 2 0.40 1.01 findPattern(bit_array_c*, bool*, int, int, int)
12.32 1.06 0.26 98375762 0.00 0.00 __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::__normal_iterator(unsigned char const* const&)
11.85 1.31 0.25 48183659 0.00 0.00 __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator+(int const&) const
11.37 1.55 0.24 49187881 0.00 0.00 std::vector<unsigned char, std::allocator<unsigned char> >::begin() const
9.24 1.75 0.20 48183659 0.00 0.00 bit_array_c::operator[](unsigned int) const
8.06 1.92 0.17 48183659 0.00 0.00 std::vector<unsigned char, std::allocator<unsigned char> >::operator[](unsigned int) const
5.21 2.02 0.11 48183659 0.00 0.00 __gnu_cxx::__normal_iterator<unsigned char const*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator*() const
0.95 2.04 0.02 bit_array_c::operator()(unsigned int)
0.47 2.06 0.01 6025316 0.00 0.00 __gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >::__normal_iterator(unsigned char* const&)
0.47 2.06 0.01 3012657 0.00 0.00 __gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >::operator*() const
0.47 2.08 0.01 1004222 0.00 0.00 std::vector<unsigned char, std::allocator<unsigned char> >::end() const
... remainder omitted ...
Run Code Online (Sandbox Code Playgroud)
我对C++的STL并不是很熟悉,但任何人都可以解释为什么,例如,std :: vector :: begin()被调用了几百万次?当然,我是否可以做些什么来加快速度呢?
编辑:我只是放弃并优化了搜索功能(循环).
您在配置文件输出中看到许多内联函数的事实意味着它们没有内联 - 也就是说,您没有打开优化进行编译.因此,优化代码最简单的方法是使用-O2或-O3.
分析未经优化的代码很少值得,因为优化和未优化代码的执行配置文件可能会完全不同.33