如何知道使用哪个malloc?

use*_*652 7 c++ heap malloc

我理解它的方式,存在许多不同的malloc实现:

  • dlmalloc - 通用分配器
  • ptmalloc2 - glibc
  • jemalloc - FreeBSD和Firefox
  • tcmalloc - 谷歌
  • libumem - Solaris

有没有办法确定我的(linux)系统上实际使用了哪个malloc?

我读到"由于ptmalloc2的线程支持,它成为了linux的默认内存分配器." 我有什么方法可以自己检查一下吗?

我问,因为我似乎没有通过在下面的代码中对malloc循环进行并列化来加快速度:

for (int i = 1; i <= 16; i += 1 ) {
    parallelMalloc(i);
}

 void parallelMalloc(int parallelism, int mallocCnt = 10000000) {

    omp_set_num_threads(parallelism);

    std::vector<char*> ptrStore(mallocCnt);

    boost::posix_time::ptime t1 = boost::posix_time::microsec_clock::local_time();

    #pragma omp parallel for
    for (int i = 0; i < mallocCnt; i++) {
        ptrStore[i] = ((char*)malloc(100 * sizeof(char)));
    }

    boost::posix_time::ptime t2 = boost::posix_time::microsec_clock::local_time();

    #pragma omp parallel for
    for (int i = 0; i < mallocCnt; i++) {
        free(ptrStore[i]);
    }

    boost::posix_time::ptime t3 = boost::posix_time::microsec_clock::local_time();


    boost::posix_time::time_duration malloc_time = t2 - t1;
    boost::posix_time::time_duration free_time   = t3 - t2;

    std::cout << " parallelism = "  << parallelism << "\t itr = " << mallocCnt <<  "\t malloc_time = " <<
            malloc_time.total_milliseconds() << "\t free_time = " << free_time.total_milliseconds() << std::endl;
}
Run Code Online (Sandbox Code Playgroud)

这给了我一个输出

 parallelism = 1         itr = 10000000  malloc_time = 1225      free_time = 1517
 parallelism = 2         itr = 10000000  malloc_time = 1614      free_time = 1112
 parallelism = 3         itr = 10000000  malloc_time = 1619      free_time = 687
 parallelism = 4         itr = 10000000  malloc_time = 2325      free_time = 620
 parallelism = 5         itr = 10000000  malloc_time = 2233      free_time = 550
 parallelism = 6         itr = 10000000  malloc_time = 2207      free_time = 489
 parallelism = 7         itr = 10000000  malloc_time = 2778      free_time = 398
 parallelism = 8         itr = 10000000  malloc_time = 1813      free_time = 389
 parallelism = 9         itr = 10000000  malloc_time = 1997      free_time = 350
 parallelism = 10        itr = 10000000  malloc_time = 1922      free_time = 291
 parallelism = 11        itr = 10000000  malloc_time = 2480      free_time = 257
 parallelism = 12        itr = 10000000  malloc_time = 1614      free_time = 256
 parallelism = 13        itr = 10000000  malloc_time = 1387      free_time = 289
 parallelism = 14        itr = 10000000  malloc_time = 1481      free_time = 248
 parallelism = 15        itr = 10000000  malloc_time = 1252      free_time = 297
 parallelism = 16        itr = 10000000  malloc_time = 1063      free_time = 281
Run Code Online (Sandbox Code Playgroud)

uh *_*per 4

\n

我读到“由于 ptmalloc2\xe2\x80\x99s 线程支持,它成为 Linux 的默认内存分配器”。有什么办法让我自己检查一下吗?

\n
\n\n

glibc内部使用ptmalloc2,这不是最近的开发。不管怎样,这并不是非常困难getconf GNU_LIBC_VERSION,然后交叉检查版本,看看是否ptmalloc2在该版本中使用,但我愿意打赌你会浪费时间。

\n\n
\n

我这样问是因为我似乎没有通过在下面的代码中并行化我的 malloc 循环来获得任何速度

\n
\n\n

将您的示例转换为MVCE(为简洁起见,此处省略代码),并使用 进行编译g++ -Wall -pedantic -O3 -pthread -fopenmpg++ 5.3.1这里是我的结果。

\n\n

使用 OpenMP:

\n\n
 parallelism = 1     itr = 10000000  malloc_time = 746   free_time = 263\n parallelism = 2     itr = 10000000  malloc_time = 541   free_time = 267\n parallelism = 3     itr = 10000000  malloc_time = 405   free_time = 259\n parallelism = 4     itr = 10000000  malloc_time = 324   free_time = 221\n parallelism = 5     itr = 10000000  malloc_time = 330   free_time = 242\n parallelism = 6     itr = 10000000  malloc_time = 287   free_time = 244\n parallelism = 7     itr = 10000000  malloc_time = 257   free_time = 226\n parallelism = 8     itr = 10000000  malloc_time = 270   free_time = 225\n parallelism = 9     itr = 10000000  malloc_time = 253   free_time = 225\n parallelism = 10    itr = 10000000  malloc_time = 236   free_time = 226\n parallelism = 11    itr = 10000000  malloc_time = 225   free_time = 239\n parallelism = 12    itr = 10000000  malloc_time = 276   free_time = 258\n parallelism = 13    itr = 10000000  malloc_time = 241   free_time = 228\n parallelism = 14    itr = 10000000  malloc_time = 254   free_time = 225\n parallelism = 15    itr = 10000000  malloc_time = 278   free_time = 272\n parallelism = 16    itr = 10000000  malloc_time = 235   free_time = 220\n\n23.87 user \n2.11 system \n0:10.41 elapsed \n249% CPU\n
Run Code Online (Sandbox Code Playgroud)\n\n

没有 OpenMP:

\n\n
 parallelism = 1     itr = 10000000  malloc_time = 748   free_time = 263\n parallelism = 2     itr = 10000000  malloc_time = 344   free_time = 256\n parallelism = 3     itr = 10000000  malloc_time = 751   free_time = 254\n parallelism = 4     itr = 10000000  malloc_time = 339   free_time = 262\n parallelism = 5     itr = 10000000  malloc_time = 748   free_time = 253\n parallelism = 6     itr = 10000000  malloc_time = 330   free_time = 256\n parallelism = 7     itr = 10000000  malloc_time = 734   free_time = 260\n parallelism = 8     itr = 10000000  malloc_time = 334   free_time = 259\n parallelism = 9     itr = 10000000  malloc_time = 750   free_time = 256\n parallelism = 10    itr = 10000000  malloc_time = 339   free_time = 255\n parallelism = 11    itr = 10000000  malloc_time = 743   free_time = 267\n parallelism = 12    itr = 10000000  malloc_time = 342   free_time = 261\n parallelism = 13    itr = 10000000  malloc_time = 739   free_time = 252\n parallelism = 14    itr = 10000000  malloc_time = 333   free_time = 252\n parallelism = 15    itr = 10000000  malloc_time = 740   free_time = 252\n parallelism = 16    itr = 10000000  malloc_time = 330   free_time = 252\n\n13.38 user \n4.66 system \n0:18.08 elapsed \n99% CPU \n
Run Code Online (Sandbox Code Playgroud)\n\n

并行似乎快了大约8秒。还是不相信?好的。我抢先一步dlmalloc,跑去make制作libmalloc.a。我的新命令是g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc

\n\n

使用 OpenMP:

\n\n
parallelism = 1  itr = 10000000  malloc_time = 814   free_time = 277\n
Run Code Online (Sandbox Code Playgroud)\n\n

37 秒后我CTRL-C了。

\n\n

没有 OpenMP:

\n\n
 parallelism = 1     itr = 10000000  malloc_time = 772   free_time = 271\n parallelism = 2     itr = 10000000  malloc_time = 780   free_time = 272\n parallelism = 3     itr = 10000000  malloc_time = 783   free_time = 272\n parallelism = 4     itr = 10000000  malloc_time = 792   free_time = 277\n parallelism = 5     itr = 10000000  malloc_time = 813   free_time = 281\n parallelism = 6     itr = 10000000  malloc_time = 800   free_time = 275\n parallelism = 7     itr = 10000000  malloc_time = 795   free_time = 277\n parallelism = 8     itr = 10000000  malloc_time = 790   free_time = 273\n parallelism = 9     itr = 10000000  malloc_time = 788   free_time = 277\n parallelism = 10    itr = 10000000  malloc_time = 784   free_time = 276\n parallelism = 11    itr = 10000000  malloc_time = 786   free_time = 284\n parallelism = 12    itr = 10000000  malloc_time = 807   free_time = 279\n parallelism = 13    itr = 10000000  malloc_time = 791   free_time = 277\n parallelism = 14    itr = 10000000  malloc_time = 790   free_time = 273\n parallelism = 15    itr = 10000000  malloc_time = 785   free_time = 276\n parallelism = 16    itr = 10000000  malloc_time = 787   free_time = 275\n\n6.48 user \n11.27 system \n0:17.81 elapsed \n99% CPU\n
Run Code Online (Sandbox Code Playgroud)\n\n

差异相当显着。我怀疑问题出在您更复杂的代码中,或者您的基准测试有问题。

\n