我正在使用a boost::icl::interval_map将字节范围映射到一组字符串.地图是从(已排序的)磁盘文件加载,然后我使用下面的代码进行查找.
问题是查找真的很慢.
在我的测试中,我在地图中插入了66425个范围.我分析了代码,基本上> 99.9%的时间花在了各种Boost函数上(没有一个特定的函数很慢,很多时间分布在很多函数上).
可以做些什么来加快速度?
我的树是不平衡的(我如何找到?我怎样才能重新平衡它?)
使用set <string>有问题吗?
计算地图和窗口的交点是一个问题吗?(虽然这是我需要的,所以我看不出怎么做).
using namespace std;
typedef set<string> objects;
typedef boost::icl::interval_map<uint64_t, objects> ranges;
void
find_range (const ranges *map, uint64_t start, uint64_t end,
void (*f) (uint64_t start, uint64_t end, const char *object,
void *opaque),
void *opaque)
{
boost::icl::interval<uint64_t>::type window;
window = boost::icl::interval<uint64_t>::right_open (start, end);
ranges r = *map & window;
ranges::iterator iter = r.begin ();
while (iter != r.end ()) {
boost::icl::interval<uint64_t>::type range = iter->first;
uint64_t start = range.lower ();
uint64_t end = range.upper ();
objects obj_set = iter->second;
objects::iterator iter2 = obj_set.begin ();
while (iter2 != obj_set.end ()) {
f (start, end, iter2->c_str (), opaque);
iter2++;
}
iter++;
}
}
Run Code Online (Sandbox Code Playgroud)
前几个配置文件条目:
% cumulative self self total
time seconds seconds calls us/call us/call name
9.77 0.13 0.13 21866814 0.01 boost::icl::interval_bounds::interval_bounds(unsigned char)
6.02 0.21 0.08 9132134 0.01 boost::icl::interval_traits<boost::icl::discrete_interval<unsigned long, std::less> >::lower(boost::icl::discrete_interval<unsigned long, std::less> const&)
6.02 0.29 0.08 6004967 0.01 boost::enable_if<boost::icl::is_discrete_interval<boost::icl::discrete_interval<unsigned long, std::less> >, bool>::type boost::icl::is_empty<boost::icl::discrete_interval<unsigned long, std::less> >(boost::icl::discrete_interval<unsigned long, std::less> const&)
5.26 0.36 0.07 21210093 0.00 boost::icl::discrete_interval<unsigned long, std::less>::bounds() const
5.26 0.43 0.07 11964109 0.01 std::less<unsigned long>::operator()(unsigned long const&, unsigned long const&) const
4.51 0.49 0.06 35761849 0.00 std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::_Identity<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_S_left(std::_Rb_tree_node_base const*)
4.51 0.55 0.06 12009934 0.00 boost::icl::operator==(boost::icl::interval_bounds, boost::icl::interval_bounds)
3.76 0.60 0.05 12078493 0.00 boost::icl::discrete_interval<unsigned long, std::less>::upper() const
3.76 0.65 0.05 12077959 0.00 boost::enable_if<boost::icl::is_interval<boost::icl::discrete_interval<unsigned long, std::less> >, boost::icl::interval_traits<boost::icl::discrete_interval<unsigned long, std::less> >::domain_type>::type boost::icl::upper<boost::icl::discrete_interval<unsigned long, std::less> >(boost::icl::discrete_interval<unsigned long, std::less> const&)
3.76 0.70 0.05 8837475 0.01 boost::icl::interval_bounds::bits() const
3.76 0.75 0.05 6004967 0.01 boost::enable_if<boost::icl::is_interval<boost::icl::discrete_interval<unsigned long, std::less> >, bool>::type boost::icl::domain_less_equal<boost::icl::discrete_interval<unsigned long, std::less> >(boost::icl::interval_traits<boost::icl::discrete_interval<unsigned long, std::less> >::domain_type const&, boost::icl::interval_traits<boost::icl::discrete_interval<unsigned long, std::less> >::domain_type const&)
3.01 0.79 0.04 5891650 0.01 boost::icl::is_right_closed(boost::icl::interval_bounds)
Run Code Online (Sandbox Code Playgroud)
数据集:http ://oirase.annexia.org/tmp/bmap.txt
完整代码:http://git.annexia.org/?p = virt-bmap.git ; a = tree
seh*_*ehe 14
在这个答案中,我提出了三个优化:
更换对象std::set通过boost::container::flat_set改进局部性(和可能的重新分配成本,因为最物体集合是<4种的元素)
UPDATE在下面我最终版本,只需更换
boost::container::flat_map背部std::set只是性能降低find_range的一个因素〜2倍,并find_range_ex通过〜4倍的因素我的测试系统上
替换的对象ID std::string由string_atom(这在技术上是一个char const*,但在逻辑上是唯一的).此技术类似于各种VM实现(如Java/.NET)中的实习字符串.
注意:当前的实现
make_atom非常简单,永远不会释放原子.您可能希望在双端队列中备份字符串,使用Boost Flyweights,池分配器或其中某些组合以使其更智能.但是,是否需要这取决于您的使用案例
用equal_range呼叫替换地图交集,通过避免复制(大量)数据来节省大量时间
_ UPDATE当应用只是这种优化的隔离加快已经是26〜30倍.请注意,与包含所有三个优化的内存相比,内存使用量大约是20MiB的两倍_
在开始之前,我想知道数据是什么样的.因此,编写一些代码来解析bmap.txt输入,我们得到:
Parsed ok
Histogram of 66425 input lines
d: 3367
f: 20613
p: 21222
v: 21223
ranges size: 6442450944
ranges iterative size: 21223
Min object set: 1.000000
Max object set: 234.000000
Average object set: 3.129859
Min interval width: 1024.000000
Max interval width: 2526265344.000000
Average interval width: 296.445177k
First: [0,1048576)
Last: [3916185600,6442450944)
String atoms: 23904 unique in 66425 total
Atom efficiency: 35.986451%
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,这些集合通常是~3个项目,而且很多都是重复的.
使用make_atom对象命名方法boost::flat_set减少了从~15GiB到~10Gib的内存分配.
这种优化还减少了字符串与集合插入的指针比较和组合策略的比较interval_map,因此对于较大的数据集,这有可能具有大量的加速.
查询性能确实因输入的部分副本而严重削弱.
不要复制,而是查看重叠范围,只需更换:
const ranges r = *map & window;
ranges::const_iterator iter = r.begin ();
while (iter != r.end ()) {
Run Code Online (Sandbox Code Playgroud)
同
auto r = map->equal_range(window);
ranges::const_iterator iter = r.first;
while (iter != r.second) {
Run Code Online (Sandbox Code Playgroud)
在我的系统上运行10000个相同的随机查询与两个版本导致加速为32x:
10000 'random' OLD lookups resulted in 156729884 callbacks in 29148ms
10000 'random' NEW lookups resulted in 156729884 callbacks in 897ms
real 0m31.715s
user 0m31.664s
sys 0m0.012s
Run Code Online (Sandbox Code Playgroud)
运行时现在由解析/统计主导.完整的基准代码在这里:在Coliru
#define BOOST_RESULT_OF_USE_DECTYPE
#define BOOST_SPIRIT_USE_PHOENIX_V3
/* virt-bmap examiner plugin
* Copyright (C) 2014 Red Hat Inc.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
#include <assert.h>
#include <boost/icl/interval.hpp>
#include <boost/icl/interval_set.hpp>
#include <boost/icl/interval_map.hpp>
#include <boost/container/flat_set.hpp>
using namespace std;
/* Maps intervals (uint64_t, uint64_t) to a set of strings, where each
* string represents an object that covers that range.
*/
static size_t atoms_requested = 0;
static size_t atoms_unique_created = 0;
using string_atom = const char*;
string_atom make_atom(std::string&& s)
{
static std::set<std::string> s_atoms;
atoms_requested += 1;
auto it = s_atoms.find(s);
if (it != s_atoms.end())
return it->c_str();
atoms_unique_created += 1;
return s_atoms.insert(std::move(s)).first->c_str();
}
typedef boost::container::flat_set<string_atom> objects;
typedef boost::icl::interval_map<uint64_t, objects> ranges;
ranges*
new_ranges (void)
{
return new ranges ();
}
void
free_ranges (ranges *mapv)
{
ranges *map = (ranges *) mapv;
delete map;
}
extern "C" void
insert_range (void *mapv, uint64_t start, uint64_t end, const char *object)
{
ranges *map = (ranges *) mapv;
objects obj_set;
obj_set.insert (obj_set.end(), object);
map->add (std::make_pair (boost::icl::interval<uint64_t>::right_open (start, end), // SEHE added std::
obj_set));
}
extern "C" void
iter_range (void *mapv, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
{
ranges *map = (ranges *) mapv;
ranges::iterator iter = map->begin ();
while (iter != map->end ()) {
boost::icl::interval<uint64_t>::type range = iter->first;
uint64_t start = range.lower ();
uint64_t end = range.upper ();
objects obj_set = iter->second;
objects::iterator iter2 = obj_set.begin ();
while (iter2 != obj_set.end ()) {
f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
iter2++;
}
iter++;
}
}
extern "C" void
find_range (void const *mapv, uint64_t start, uint64_t end, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
{
const ranges *map = (const ranges *) mapv;
boost::icl::interval<uint64_t>::type window;
window = boost::icl::interval<uint64_t>::right_open (start, end);
const ranges r = *map & window;
ranges::const_iterator iter = r.begin ();
while (iter != r.end ()) {
boost::icl::interval<uint64_t>::type range = iter->first;
uint64_t start = range.lower ();
uint64_t end = range.upper ();
objects obj_set = iter->second;
objects::iterator iter2 = obj_set.begin ();
while (iter2 != obj_set.end ()) {
f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
iter2++;
}
iter++;
}
}
extern "C" void
find_range_ex (void const *mapv, uint64_t start, uint64_t end, void (*f) (uint64_t start, uint64_t end, const char *object, void *opaque), void *opaque)
{
const ranges *map = (const ranges *) mapv;
boost::icl::interval<uint64_t>::type window;
window = boost::icl::interval<uint64_t>::right_open (start, end);
#if 0
const ranges r = *map & window;
ranges::const_iterator iter = r.begin ();
while (iter != r.end ()) {
#else
auto r = map->equal_range(window);
ranges::const_iterator iter = r.first;
while (iter != r.second) {
#endif
boost::icl::interval<uint64_t>::type range = iter->first;
uint64_t start = range.lower ();
uint64_t end = range.upper ();
objects obj_set = iter->second;
objects::iterator iter2 = obj_set.begin ();
while (iter2 != obj_set.end ()) {
f (start, end, *iter2/*->c_str ()*/, opaque); // SEHE
iter2++;
}
iter++;
}
}
#include <memory>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>
#include <fstream>
#include <chrono>
std::map<char, size_t> histo;
bool insert_line_of_input(ranges& bmap_data, uint64_t b, uint64_t e, char type, std::string& object) {
if (!object.empty())
histo[type]++;
//std::cout << std::hex << b << " " << e << " " << type << " " << object << "\n";
#if 0
object.insert(object.begin(), ':');
object.insert(object.begin(), type);
#endif
insert_range(&bmap_data, b, e, make_atom(std::move(object)));
return true;
}
std::vector<std::pair<uint64_t, uint64_t> > generate_test_queries(ranges const& bmap_data, size_t n) {
std::vector<std::pair<uint64_t, uint64_t> > queries;
queries.reserve(n);
for (size_t i = 0; i < n; ++i)
{
auto start = (static_cast<uint64_t>(rand()) * rand()) % bmap_data.size();
auto end = start + rand();
queries.emplace_back(start,end);
}
return queries;
}
ranges read_mapfile(const char* fname) {
std::ifstream ifs(fname);
boost::spirit::istream_iterator f(ifs >> std::noskipws), l;
ranges bmap_data;
namespace phx = boost::phoenix;
using namespace boost::spirit::qi;
uint_parser<uint64_t, 16> offset;
if (!phrase_parse(f,l,
("1 " >> offset >> offset >> char_("pvdf") >> as_string[lexeme[+graph] >> attr('/') >> lexeme[*~char_("\r\n")]])
[ _pass = phx::bind(insert_line_of_input, phx::ref(bmap_data), _1, _2, _3, _4) ] % eol >> *eol,
blank))
{
exit(255);
} else
{
std::cout << "Parsed ok\n";
}
if (f!=l)
std::cout << "Warning: remaining input '" << std::string(f,l) << "\n";
return bmap_data;
}
void report_statistics(ranges const& bmap_data) {
size_t total = 0;
for (auto e : histo) total += e.second;
std::cout << "Histogram of " << total << " input lines\n";
for (auto e : histo)
std::cout << e.first << ": " << e.second << "\n";
namespace ba = boost::accumulators;
ba::accumulator_set<double, ba::stats<ba::tag::mean, ba::tag::max, ba::tag::min> >
object_sets, interval_widths;
for (auto const& r : bmap_data)
{
auto width = r.first.upper() - r.first.lower();
assert(width % 1024 == 0);
interval_widths(width);
object_sets(r.second.size());
}
std::cout << std::fixed;
std::cout << "ranges size: " << bmap_data.size() << "\n";
std::cout << "ranges iterative size: " << bmap_data.iterative_size() << "\n";
std::cout << "Min object set: " << ba::min(object_sets) << "\n" ;
std::cout << "Max object set: " << ba::max(object_sets) << "\n" ;
std::cout << "Average object set: " << ba::mean(object_sets) << "\n" ;
std::cout << "Min interval width: " << ba::min(interval_widths) << "\n" ;
std::cout << "Max interval width: " << ba::max(interval_widths) << "\n" ;
std::cout << "Average interval width: " << ba::mean(interval_widths)/1024.0 << "k\n" ;
std::cout << "First: " << bmap_data.begin()->first << "\n" ;
std::cout << "Last: " << bmap_data.rbegin()->first << "\n" ;
std::cout << "String atoms: " << atoms_unique_created << " unique in " << atoms_requested << " total\n";
std::cout << "Atom efficiency: " << (atoms_unique_created*100.0/atoms_requested) << "%\n";
}
void perform_comparative_benchmarks(ranges const& bmap_data, size_t number_of_queries) {
srand(42);
auto const queries = generate_test_queries(bmap_data, number_of_queries);
using hrc = std::chrono::high_resolution_clock;
{
auto start = hrc::now();
size_t callbacks = 0;
for (auto const& q: queries)
{
find_range(&bmap_data, q.first, q.second,
[](uint64_t start, uint64_t end, const char *object, void *opaque) {
++(*static_cast<size_t*>(opaque));
}, &callbacks);
}
std::cout << number_of_queries << " 'random' OLD lookups resulted in " << callbacks
<< " callbacks in " << std::chrono::duration_cast<std::chrono::milliseconds>((hrc::now()-start)).count() << "ms\n";
}
{
auto start = hrc::now();
size_t callbacks = 0;
for (auto const& q: queries)
{
find_range_ex(&bmap_data, q.first, q.second,
[](uint64_t start, uint64_t end, const char *object, void *opaque) {
++(*static_cast<size_t*>(opaque));
}, &callbacks);
}
std::cout << number_of_queries << " 'random' NEW lookups resulted in " << callbacks
<< " callbacks in " << std::chrono::duration_cast<std::chrono::milliseconds>((hrc::now()-start)).count() << "ms\n";
}
}
int main() {
auto bmap = read_mapfile("bmap.txt");
report_statistics(bmap);
perform_comparative_benchmarks(bmap, 1000);
#if 0 // to dump ranges to console
for (auto const& r : bmap)
{
std::cout << r.first << "\t" << r.second.size() << "\t";
std::copy(r.second.begin(), r.second.end(), std::ostream_iterator<std::string>(std::cout, "\t"));
std::cout << "\n";
}
#endif
}
Run Code Online (Sandbox Code Playgroud)