我理解它的方式,存在许多不同的malloc实现:
有没有办法确定我的(linux)系统上实际使用了哪个malloc?
我读到"由于ptmalloc2的线程支持,它成为了linux的默认内存分配器." 我有什么方法可以自己检查一下吗?
我问,因为我似乎没有通过在下面的代码中对malloc循环进行并列化来加快速度:
for (int i = 1; i <= 16; i += 1 ) {
parallelMalloc(i);
}
void parallelMalloc(int parallelism, int mallocCnt = 10000000) {
omp_set_num_threads(parallelism);
std::vector<char*> ptrStore(mallocCnt);
boost::posix_time::ptime t1 = boost::posix_time::microsec_clock::local_time();
#pragma omp parallel for
for (int i = 0; i < mallocCnt; i++) {
ptrStore[i] = ((char*)malloc(100 * sizeof(char)));
}
boost::posix_time::ptime t2 = boost::posix_time::microsec_clock::local_time();
#pragma omp parallel for
for (int i = 0; i < mallocCnt; i++) {
free(ptrStore[i]);
}
boost::posix_time::ptime t3 = boost::posix_time::microsec_clock::local_time();
boost::posix_time::time_duration malloc_time = t2 - t1;
boost::posix_time::time_duration free_time = t3 - t2;
std::cout << " parallelism = " << parallelism << "\t itr = " << mallocCnt << "\t malloc_time = " <<
malloc_time.total_milliseconds() << "\t free_time = " << free_time.total_milliseconds() << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
这给了我一个输出
parallelism = 1 itr = 10000000 malloc_time = 1225 free_time = 1517
parallelism = 2 itr = 10000000 malloc_time = 1614 free_time = 1112
parallelism = 3 itr = 10000000 malloc_time = 1619 free_time = 687
parallelism = 4 itr = 10000000 malloc_time = 2325 free_time = 620
parallelism = 5 itr = 10000000 malloc_time = 2233 free_time = 550
parallelism = 6 itr = 10000000 malloc_time = 2207 free_time = 489
parallelism = 7 itr = 10000000 malloc_time = 2778 free_time = 398
parallelism = 8 itr = 10000000 malloc_time = 1813 free_time = 389
parallelism = 9 itr = 10000000 malloc_time = 1997 free_time = 350
parallelism = 10 itr = 10000000 malloc_time = 1922 free_time = 291
parallelism = 11 itr = 10000000 malloc_time = 2480 free_time = 257
parallelism = 12 itr = 10000000 malloc_time = 1614 free_time = 256
parallelism = 13 itr = 10000000 malloc_time = 1387 free_time = 289
parallelism = 14 itr = 10000000 malloc_time = 1481 free_time = 248
parallelism = 15 itr = 10000000 malloc_time = 1252 free_time = 297
parallelism = 16 itr = 10000000 malloc_time = 1063 free_time = 281
Run Code Online (Sandbox Code Playgroud)
\n\n\n我读到“由于 ptmalloc2\xe2\x80\x99s 线程支持,它成为 Linux 的默认内存分配器”。有什么办法让我自己检查一下吗?
\n
glibc
内部使用ptmalloc2
,这不是最近的开发。不管怎样,这并不是非常困难getconf GNU_LIBC_VERSION
,然后交叉检查版本,看看是否ptmalloc2
在该版本中使用,但我愿意打赌你会浪费时间。
\n\n\n我这样问是因为我似乎没有通过在下面的代码中并行化我的 malloc 循环来获得任何速度
\n
将您的示例转换为MVCE(为简洁起见,此处省略代码),并使用 进行编译g++ -Wall -pedantic -O3 -pthread -fopenmp
,g++ 5.3.1
这里是我的结果。
使用 OpenMP:
\n\n parallelism = 1 itr = 10000000 malloc_time = 746 free_time = 263\n parallelism = 2 itr = 10000000 malloc_time = 541 free_time = 267\n parallelism = 3 itr = 10000000 malloc_time = 405 free_time = 259\n parallelism = 4 itr = 10000000 malloc_time = 324 free_time = 221\n parallelism = 5 itr = 10000000 malloc_time = 330 free_time = 242\n parallelism = 6 itr = 10000000 malloc_time = 287 free_time = 244\n parallelism = 7 itr = 10000000 malloc_time = 257 free_time = 226\n parallelism = 8 itr = 10000000 malloc_time = 270 free_time = 225\n parallelism = 9 itr = 10000000 malloc_time = 253 free_time = 225\n parallelism = 10 itr = 10000000 malloc_time = 236 free_time = 226\n parallelism = 11 itr = 10000000 malloc_time = 225 free_time = 239\n parallelism = 12 itr = 10000000 malloc_time = 276 free_time = 258\n parallelism = 13 itr = 10000000 malloc_time = 241 free_time = 228\n parallelism = 14 itr = 10000000 malloc_time = 254 free_time = 225\n parallelism = 15 itr = 10000000 malloc_time = 278 free_time = 272\n parallelism = 16 itr = 10000000 malloc_time = 235 free_time = 220\n\n23.87 user \n2.11 system \n0:10.41 elapsed \n249% CPU\n
Run Code Online (Sandbox Code Playgroud)\n\n没有 OpenMP:
\n\n parallelism = 1 itr = 10000000 malloc_time = 748 free_time = 263\n parallelism = 2 itr = 10000000 malloc_time = 344 free_time = 256\n parallelism = 3 itr = 10000000 malloc_time = 751 free_time = 254\n parallelism = 4 itr = 10000000 malloc_time = 339 free_time = 262\n parallelism = 5 itr = 10000000 malloc_time = 748 free_time = 253\n parallelism = 6 itr = 10000000 malloc_time = 330 free_time = 256\n parallelism = 7 itr = 10000000 malloc_time = 734 free_time = 260\n parallelism = 8 itr = 10000000 malloc_time = 334 free_time = 259\n parallelism = 9 itr = 10000000 malloc_time = 750 free_time = 256\n parallelism = 10 itr = 10000000 malloc_time = 339 free_time = 255\n parallelism = 11 itr = 10000000 malloc_time = 743 free_time = 267\n parallelism = 12 itr = 10000000 malloc_time = 342 free_time = 261\n parallelism = 13 itr = 10000000 malloc_time = 739 free_time = 252\n parallelism = 14 itr = 10000000 malloc_time = 333 free_time = 252\n parallelism = 15 itr = 10000000 malloc_time = 740 free_time = 252\n parallelism = 16 itr = 10000000 malloc_time = 330 free_time = 252\n\n13.38 user \n4.66 system \n0:18.08 elapsed \n99% CPU \n
Run Code Online (Sandbox Code Playgroud)\n\n并行似乎快了大约8秒。还是不相信?好的。我抢先一步dlmalloc
,跑去make
制作libmalloc.a
。我的新命令是g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc
使用 OpenMP:
\n\nparallelism = 1 itr = 10000000 malloc_time = 814 free_time = 277\n
Run Code Online (Sandbox Code Playgroud)\n\n37 秒后我CTRL-C了。
\n\n没有 OpenMP:
\n\n parallelism = 1 itr = 10000000 malloc_time = 772 free_time = 271\n parallelism = 2 itr = 10000000 malloc_time = 780 free_time = 272\n parallelism = 3 itr = 10000000 malloc_time = 783 free_time = 272\n parallelism = 4 itr = 10000000 malloc_time = 792 free_time = 277\n parallelism = 5 itr = 10000000 malloc_time = 813 free_time = 281\n parallelism = 6 itr = 10000000 malloc_time = 800 free_time = 275\n parallelism = 7 itr = 10000000 malloc_time = 795 free_time = 277\n parallelism = 8 itr = 10000000 malloc_time = 790 free_time = 273\n parallelism = 9 itr = 10000000 malloc_time = 788 free_time = 277\n parallelism = 10 itr = 10000000 malloc_time = 784 free_time = 276\n parallelism = 11 itr = 10000000 malloc_time = 786 free_time = 284\n parallelism = 12 itr = 10000000 malloc_time = 807 free_time = 279\n parallelism = 13 itr = 10000000 malloc_time = 791 free_time = 277\n parallelism = 14 itr = 10000000 malloc_time = 790 free_time = 273\n parallelism = 15 itr = 10000000 malloc_time = 785 free_time = 276\n parallelism = 16 itr = 10000000 malloc_time = 787 free_time = 275\n\n6.48 user \n11.27 system \n0:17.81 elapsed \n99% CPU\n
Run Code Online (Sandbox Code Playgroud)\n\n差异相当显着。我怀疑问题出在您更复杂的代码中,或者您的基准测试有问题。
\n