Vic*_*ian 2 c++ boost boost-pool
我正在学习内存池,并尝试boost::pool_allocator在我的项目中使用。根据文档,我做了一个关于时间成本的小测试:
template <typename Alloc>
void test()
{
using namespace std::chrono;
auto t0 = high_resolution_clock::now();
for (int i = 0; i < 1000; ++i) {
std::vector<int, Alloc> vec;
for (int j = 0; j < 10000; ++j)
vec.push_back(i + j);
}
auto t1 = high_resolution_clock::now();
auto time_ms = duration<double>(t1 - t0).count() * 1e3;
cout << "time cost: " << time_ms << " ms" << endl;
}
int main()
{
test<std::allocator<int>>();
test<boost::pool_allocator<int>>();
}
Run Code Online (Sandbox Code Playgroud)
结果是:
time cost: 3.97602 ms
time cost: 91.3943 ms
Run Code Online (Sandbox Code Playgroud)
Boost文档说:
池通常用于小对象的大量分配和释放。
所以我预计比上面的代码boost::pool_allocator花费的时间要少,但测试结果表明它要糟糕得多。std::allocator
难道是我用boost::pool_allocator错了?在什么情况下我可以通过使用内存池(或只是Boost pool/pool_allocator)来获得加速?
以供参考:
template <typename T,
typename UserAllocator = default_user_allocator_new_delete,
typename Mutex = details::pool::default_mutex,
unsigned NextSize = 32,
unsigned MaxSize = 0>
class pool_allocator;
Run Code Online (Sandbox Code Playgroud)
我想也许是锁定造成的。另外,可能还有更好的暗示。
我们来测试一下!实时编译器资源管理器
#include <boost/core/demangle.hpp>
#include <boost/pool/pool_alloc.hpp>
#include <chrono>
#include <iomanip>
#include <iostream>
#include <vector>
using namespace std::chrono_literals;
auto static now = std::chrono::high_resolution_clock::now;
template <typename Alloc> void test(int run, Alloc alloc = {}) {
auto load = [=](bool RESERVE, unsigned ITERATIONS = 1'000, unsigned SIZE = 10'000) {
for (unsigned i = 0; i < ITERATIONS; ++i) {
std::vector<int, Alloc> vec(alloc);
if (RESERVE)
vec.reserve(SIZE);
for (unsigned j = 0; j < SIZE; ++j)
vec.push_back(i + j);
}
};
auto lap_time = [t0 = now()]() mutable {
return now() - std::exchange(t0, now());
};
load(false); auto without_reserve = lap_time() / 1.0ms;
load(true); auto with_reserve = lap_time() / 1.0ms;
std::cout << "run " << run //
<< " naive: " << std::setw(7) << without_reserve << "ms" //
<< " reserved: " << std::setw(7) << with_reserve << "ms" //
<< "(" << boost::core::demangle(typeid(Alloc).name()) << ")" //
<< std::endl;
}
void run_tests(int run) {
test<std::allocator<int>>(run);
using NullMx = boost::details::pool::null_mutex;
using Mx = boost::details::pool::default_mutex;
using Malloc = boost::default_user_allocator_malloc_free;
using NewDelete = boost::default_user_allocator_new_delete;
//
// no hints
//
test<boost::pool_allocator<int, Malloc, NullMx>>(run);
test<boost::pool_allocator<int, NewDelete, NullMx>>(run);
test<boost::pool_allocator<int, Malloc, Mx>>(run);
test<boost::pool_allocator<int, NewDelete, Mx>>(run);
//
// hinted
//
test<boost::pool_allocator<int, Malloc, NullMx, 1'000, 0>>(run);
test<boost::pool_allocator<int, NewDelete, NullMx, 1'000, 0>>(run);
test<boost::pool_allocator<int, Malloc, Mx, 1'000, 0>>(run);
test<boost::pool_allocator<int, NewDelete, Mx, 1'000, 0>>(run);
}
int main()
{
std::cout << std::fixed << std::setprecision(3);
for (int run : {1,2,3}) {
auto t0 = now();
run_tests(run);
std::cout << " -- Done (" << (now() - t0) / 1.ms << "ms)" << std::endl;
}
}
Run Code Online (Sandbox Code Playgroud)
编译器资源管理器显示了一些真正不一致的峰值;我自己的机器没有:
run 1 naive: 8.025ms reserved: 5.412ms(std::allocator<int>)
run 1 naive: 92.212ms reserved: 31.166ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 32u, 0u>)
run 1 naive: 93.466ms reserved: 29.901ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 32u, 0u>)
run 1 naive: 92.488ms reserved: 29.883ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 32u, 0u>)
run 1 naive: 92.450ms reserved: 29.824ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 32u, 0u>)
run 1 naive: 82.879ms reserved: 27.478ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 1000u, 0u>)
run 1 naive: 82.775ms reserved: 28.187ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 1000u, 0u>)
run 1 naive: 83.189ms reserved: 27.404ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 1000u, 0u>)
run 1 naive: 83.159ms reserved: 27.468ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 1000u, 0u>)
-- Done (947.595ms)
run 2 naive: 8.007ms reserved: 5.543ms(std::allocator<int>)
run 2 naive: 92.225ms reserved: 29.882ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 32u, 0u>)
run 2 naive: 92.311ms reserved: 29.805ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 32u, 0u>)
run 2 naive: 92.601ms reserved: 29.873ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 32u, 0u>)
run 2 naive: 92.421ms reserved: 30.028ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 32u, 0u>)
run 2 naive: 83.028ms reserved: 27.493ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 1000u, 0u>)
run 2 naive: 82.822ms reserved: 27.427ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 1000u, 0u>)
run 2 naive: 83.230ms reserved: 27.493ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 1000u, 0u>)
run 2 naive: 83.104ms reserved: 27.466ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 1000u, 0u>)
-- Done (944.958ms)
run 3 naive: 8.068ms reserved: 5.422ms(std::allocator<int>)
run 3 naive: 92.282ms reserved: 29.880ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 32u, 0u>)
run 3 naive: 92.064ms reserved: 29.960ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 32u, 0u>)
run 3 naive: 92.339ms reserved: 29.928ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 32u, 0u>)
run 3 naive: 92.977ms reserved: 29.890ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 32u, 0u>)
run 3 naive: 82.906ms reserved: 27.388ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 1000u, 0u>)
run 3 naive: 82.784ms reserved: 27.585ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 1000u, 0u>)
run 3 naive: 83.157ms reserved: 28.233ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 1000u, 0u>)
run 3 naive: 83.098ms reserved: 27.466ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 1000u, 0u>)
-- Done (945.629ms)
Run Code Online (Sandbox Code Playgroud)
不过,显然,总是慢一些。让我们介绍一下
将标准分配器与以下内容进行比较boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 1000u, 0u>:
到目前为止,一切都很好!那还花那么多时间干什么?
其中unordered_malloc内联到来自各种 Boost Pool 标头的大量行。最严重的违规者是从以下内容内联的boost/pool/simple_segregated_storage.hpp:(第二列是相对于父级的成本百分比):
这些行是在try_malloc_n
template <typename SizeType>
void * simple_segregated_storage<SizeType>::try_malloc_n(
void * & start, size_type n, const size_type partition_size)
{
void * iter = nextof(start);
while (--n != 0)
{
void * next = nextof(iter);
if (next != static_cast<char *>(iter) + partition_size)
{
// next == 0 (end-of-list) or non-contiguous chunk found
start = iter;
return 0;
}
iter = next;
}
return iter;
}
Run Code Online (Sandbox Code Playgroud)
其自我描述为:
该函数尝试在空闲列表中从 start 开始查找 n 个大小为 partition_size 的连续块。如果成功,它会返回该连续序列中的最后一个块,以便 [start, {retval}] 知道该序列。如果失败,它会执行此操作,因为它位于空闲列表的末尾,或者命中了非-连续的块。无论哪种情况,它都会返回 0,并将 start 设置为最后考虑的块。如果 nextof(start) == 0,则位于空闲列表的末尾。否则,start 指向连续序列中的最后一个块,nextof(start) 指向下一个连续序列中的第一个块(假设有序)空闲列表)。
毕竟,在这种情况下,在隔离堆上追逐空闲块的成本确实太高了。餐巾纸上的一点计算显示,这try_malloc_n占了我们之前看到的高层unordered_malloc调用的 99.75%。
在我的调查过程中,我发现了许多可用于获得更多见解的定义,例如:
#define NDEBUG
//#define BOOST_POOL_INSTRUMENT 1
//#define BOOST_POOL_VALIDATE 1
//#define BOOST_POOL_VALGRIND 1
Run Code Online (Sandbox Code Playgroud)
现在,我使用 VALIDATE/INSTRUMENT 观察预期效果(非常详细的输出和轻微的性能下降)。
通过阅读名称/代码,我预计会BOOST_POOL_VALGRIND类似地降低性能(毕竟,它可能应该做额外的工作以避免运行 Valgrind 时出现误报内存错误,对吧?)。令我惊讶的是,定义它使整个事情运行得快如闪电:Live On Compiler Explorer
run 1 naive: 8.166ms reserved: 5.267ms(std::allocator<int>)
run 1 naive: 9.713ms reserved: 5.267ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 32u, 0u>)
run 1 naive: 8.853ms reserved: 5.226ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 32u, 0u>)
run 1 naive: 8.990ms reserved: 5.282ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 32u, 0u>)
run 1 naive: 8.899ms reserved: 5.246ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 32u, 0u>)
run 1 naive: 8.620ms reserved: 5.237ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 1000u, 0u>)
run 1 naive: 8.622ms reserved: 5.247ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 1000u, 0u>)
run 1 naive: 8.963ms reserved: 5.257ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 1000u, 0u>)
run 1 naive: 8.990ms reserved: 5.271ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 1000u, 0u>)
-- Done (127.276ms)
run 2 naive: 7.965ms reserved: 5.208ms(std::allocator<int>)
run 2 naive: 8.503ms reserved: 5.236ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 32u, 0u>)
run 2 naive: 8.809ms reserved: 5.254ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 32u, 0u>)
run 2 naive: 8.954ms reserved: 5.278ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 32u, 0u>)
run 2 naive: 8.878ms reserved: 5.279ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 32u, 0u>)
run 2 naive: 8.694ms reserved: 5.243ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 1000u, 0u>)
run 2 naive: 8.661ms reserved: 5.249ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 1000u, 0u>)
run 2 naive: 8.920ms reserved: 5.248ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 1000u, 0u>)
run 2 naive: 8.952ms reserved: 5.261ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 1000u, 0u>)
-- Done (125.680ms)
run 3 naive: 7.949ms reserved: 5.221ms(std::allocator<int>)
run 3 naive: 8.498ms reserved: 5.238ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 32u, 0u>)
run 3 naive: 8.813ms reserved: 5.230ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 32u, 0u>)
run 3 naive: 9.033ms reserved: 5.279ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 32u, 0u>)
run 3 naive: 8.909ms reserved: 5.252ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 32u, 0u>)
run 3 naive: 8.605ms reserved: 5.244ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, boost::details::pool::null_mutex, 1000u, 0u>)
run 3 naive: 8.623ms reserved: 5.246ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, boost::details::pool::null_mutex, 1000u, 0u>)
run 3 naive: 8.918ms reserved: 5.247ms(boost::pool_allocator<int, boost::default_user_allocator_malloc_free, std::mutex, 1000u, 0u>)
run 3 naive: 8.969ms reserved: 5.268ms(boost::pool_allocator<int, boost::default_user_allocator_new_delete, std::mutex, 1000u, 0u>)
Run Code Online (Sandbox Code Playgroud)
遗憾的是,查看细节证实它通过委托给标准库直接进行欺骗(同时通过free_list/used_list地址集增加一些开销)。
是的,标准pool/simple_segregated_storage实现在这种负载下表现很差。我无法确定这是否真的是一个错误,但根据文档(您也提到过),它看起来确实是一个错误。
| 归档时间: |
|
| 查看次数: |
1483 次 |
| 最近记录: |