use*_*820 5 c++ malloc multithreading glibc
在过去一周左右的时间里,我一直在调查应用程序中内存使用量随时间累积的问题。我把它缩小到复制一个的行
std::vector< std::vector< std::vector< std::map< uint, map< uint, std::bitset< N> > > > > >
在工作线程中(我意识到这是一种组织内存的荒谬方式)。工作线程会定期被销毁、重新创建,并在线程启动时复制该内存结构。被复制的原始数据通过主线程的引用传递给工作线程。
使用 malloc_stat 和 malloc_info,我可以看到当工作线程被销毁时,它使用的 arena/heap 将用于该结构的内存保留在其 fastbins 的空闲列表中。这是有道理的,因为有许多小于 64 字节的单独分配。
问题是,当工作线程被重新创建时,它会创建一个新的 arena/heap 而不是重用前一个,这样来自先前 arenas/heap 的 fastbins 永远不会被重用。最终,系统会在重用之前的堆/arena 来重用他们持有的 fastbin 之前耗尽内存。
有点意外,我发现在我的主线程中调用 malloc_trim(0),在加入工作线程后,会导致线程 arenas/heap 中的 fastbins 被释放。据我所知,这种行为没有记录。有人有解释吗?
这是我用来查看此行为的一些测试代码:
// includes
#include <stdio.h>
#include <algorithm>
#include <vector>
#include <iostream>
#include <stdexcept>
#include <stdio.h>
#include <string>
#include <mcheck.h>
#include <malloc.h>
#include <map>
#include <bitset>
#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>
// Number of bits per bitset.
const int sizeOfBitsets = 40;
// Executes a system command. Used to get output of "free -m".
std::string ExecuteSystemCommand(const char* cmd) {
char buffer[128];
std::string result = "";
FILE* pipe = popen(cmd, "r");
if (!pipe) throw std::runtime_error("popen() failed!");
try {
while (!feof(pipe)) {
if (fgets(buffer, 128, pipe) != NULL)
result += buffer;
}
} catch (...) {
pclose(pipe);
throw;
}
pclose(pipe);
return result;
}
// Prints output of "free -m" and output of malloc_stat().
void PrintMemoryStats()
{
try
{
char *buf;
size_t size;
FILE *fp;
std::string myCommand("free -m");
std::string result = ExecuteSystemCommand(myCommand.c_str());
printf("Free memory is \n%s\n", result.c_str());
malloc_stats();
fp = open_memstream(&buf, &size);
malloc_info(0, fp);
fclose(fp);
printf("# Memory Allocation Stats\n%s\n#> ", buf);
free(buf);
}
catch(...)
{
printf("Unable to print memory stats.\n");
throw;
}
}
void MakeCopies(std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > >& data)
{
try
{
// Create copies.
std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyA(data);
std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyB(data);
std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyC(data);
// Print memory info.
printf("Memory after creating data copies:\n");
PrintMemoryStats();
}
catch(...)
{
printf("Unable to make copies.");
throw;
}
}
int main(int argc, char** argv)
{
try
{
// When uncommented, disables the use of fastbins.
// mallopt(M_MXFAST, 0);
// Print memory info.
printf("Memory to start is:\n");
PrintMemoryStats();
// Sizes of original data.
int sizeOfDataA = 2048;
int sizeOfDataB = 4;
int sizeOfDataC = 128;
int sizeOfDataD = 20;
std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > testData;
// Populate data.
testData.resize(sizeOfDataA);
for(int a = 0; a < sizeOfDataA; ++a)
{
testData.at(a).resize(sizeOfDataB);
for(int b = 0; b < sizeOfDataB; ++b)
{
for(int c = 0; c < sizeOfDataC; ++c)
{
std::map<uint, std::bitset<sizeOfBitsets> > dataMap;
testData.at(a).at(b).insert(std::pair<uint, std::map<uint, std::bitset<sizeOfBitsets> > >(c, dataMap));
for(int d = 0; d < sizeOfDataD; ++d)
{
std::bitset<sizeOfBitsets> testBitset;
testData.at(a).at(b).at(c).insert(std::pair<uint, std::bitset<sizeOfBitsets> >(d, testBitset));
}
}
}
}
// Print memory info.
printf("Memory to after creating original data is:\n");
PrintMemoryStats();
// Start thread to make copies and wait to join.
{
boost::shared_ptr<boost::thread> makeCopiesThread = boost::shared_ptr<boost::thread>(new boost::thread(&MakeCopies, boost::ref(testData)));
makeCopiesThread->join();
}
// Print memory info.
printf("Memory to after joining thread is:\n");
PrintMemoryStats();
malloc_trim(0);
// Print memory info.
printf("Memory to after malloc_trim(0) is:\n");
PrintMemoryStats();
return 0;
}
catch(...)
{
// Log warning.
printf("Unable to run application.");
// Return failure.
return 1;
}
// Return success.
return 0;
}
Run Code Online (Sandbox Code Playgroud)
malloc 修剪调用前后的有趣输出是(查找“看这里!”):
#> Memory to after joining thread is:
Free memory is
total used free shared buff/cache available
Mem: 257676 7361 246396 25 3918 249757
Swap: 1023 0 1023
Arena 0:
system bytes = 1443450880
in use bytes = 1443316976
Arena 1:
system bytes = 35000320
in use bytes = 6608
Total (incl. mmap):
system bytes = 1478451200
in use bytes = 1443323584
max mmap regions = 0
max mmap bytes = 0
# Memory Allocation Stats
<malloc version="1">
<heap nr="0">
<sizes>
<size from="241" to="241" total="241" count="1"/>
<size from="529" to="529" total="529" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="2" size="770"/>
<system type="current" size="1443450880"/>
<system type="max" size="1443459072"/>
<aspace type="total" size="1443450880"/>
<aspace type="mprotect" size="1443450880"/>
</heap>
<heap nr="1">
<sizes>
<size from="33" to="48" total="48" count="1"/>
<size from="49" to="64" total="4026531712" count="62914558"/> <-- LOOK HERE!
<size from="65" to="80" total="160" count="2"/>
<size from="81" to="96" total="301989888" count="3145728"/> <-- LOOK HERE!
<size from="33" to="33" total="231" count="7"/>
<size from="49" to="49" total="1274" count="26"/>
<unsorted from="0" to="49377" total="1431600" count="6144"/>
</sizes>
<total type="fast" count="66060289" size="4328521808"/>
<total type="rest" count="6177" size="1433105"/>
<system type="current" size="4329967616"/>
<system type="max" size="4329967616"/>
<aspace type="total" size="35000320"/>
<aspace type="mprotect" size="35000320"/>
</heap>
<total type="fast" count="66060289" size="4328521808"/>
<total type="rest" count="6179" size="1433875"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="5773418496"/>
<system type="max" size="5773426688"/>
<aspace type="total" size="1478451200"/>
<aspace type="mprotect" size="1478451200"/>
</malloc>
#> Memory to after malloc_trim(0) is:
Free memory is
total used free shared buff/cache available
Mem: 257676 3269 250488 25 3918 253850
Swap: 1023 0 1023
Arena 0:
system bytes = 1443319808
in use bytes = 1443316976
Arena 1:
system bytes = 35000320
in use bytes = 6608
Total (incl. mmap):
system bytes = 1478320128
in use bytes = 1443323584
max mmap regions = 0
max mmap bytes = 0
# Memory Allocation Stats
<malloc version="1">
<heap nr="0">
<sizes>
<size from="209" to="209" total="209" count="1"/>
<size from="529" to="529" total="529" count="1"/>
<unsorted from="0" to="49377" total="1431600" count="6144"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="6146" size="1432338"/>
<system type="current" size="1443459072"/>
<system type="max" size="1443459072"/>
<aspace type="total" size="1443459072"/>
<aspace type="mprotect" size="1443459072"/>
</heap>
<heap nr="1"> <---------------------------------------- LOOK HERE!
<sizes> <-- HERE!
<unsorted from="0" to="67108801" total="4296392384" count="6208"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="6208" size="4296392384"/>
<system type="current" size="4329967616"/>
<system type="max" size="4329967616"/>
<aspace type="total" size="35000320"/>
<aspace type="mprotect" size="35000320"/>
</heap>
<total type="fast" count="0" size="0"/>
<total type="rest" count="12354" size="4297824722"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="5773426688"/>
<system type="max" size="5773426688"/>
<aspace type="total" size="1478459392"/>
<aspace type="mprotect" size="1478459392"/>
</malloc>
#>
Run Code Online (Sandbox Code Playgroud)
几乎没有关于 malloc_info 输出的文档,所以我不确定我指出的那些输出是否真的是快速垃圾箱。为了验证它们确实是 fastbins,我取消注释代码行
mallopt(M_MXFAST, 0);
Run Code Online (Sandbox Code Playgroud)
在调用 malloc_trim(0) 之前,在加入线程后禁用 fastbins 和堆 1 的内存使用,看起来就像在调用 malloc_trim(0) 之后启用了 fastbins 一样。最重要的是,禁用 fastbins 将在线程加入后立即将内存返回给系统。调用 malloc_trim(0),在加入启用 fastbins 的线程后,也将内存返回给系统。
malloc_trim(0) 的文档说明它只能从主 arena 堆的顶部释放内存,那么这里发生了什么?我在 CentOS 7 上运行 glibc 版本 2.17。
malloc_trim(0) 声明它只能从主竞技场堆的顶部释放内存,那么这里发生了什么?
它可以被称为“过时”或“不正确”的文档。Glibc没有函数文档malloc_trim;Linux 使用手册页项目中的手册页。malloc_trim http://man7.org/linux/man-pages/man3/malloc_trim.3.html 的手册页是由手册页维护者于 2012 年编写的新手册页。可能他使用了 glibc malloc/malloc.c 源代码http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#675中的一些注释
676 malloc_trim(size_t pad);
677
678 If possible, gives memory back to the system (via negative
679 arguments to sbrk) if there is unused memory at the `high' end of
680 the malloc pool. You can call this after freeing large blocks of
681 memory to potentially reduce the system-level memory requirements
682 of a program. However, it cannot guarantee to reduce memory. Under
683 some allocation patterns, some large free blocks of memory will be
684 locked between two used chunks, so they cannot be given back to
685 the system.
686
687 The `pad' argument to malloc_trim represents the amount of free
688 trailing space to leave untrimmed. If this argument is zero,
689 only the minimum amount of memory to maintain internal data
690 structures will be left (one page or less). Non-zero arguments
691 can be supplied to maintain enough trailing space to service
692 future expected allocations without having to re-obtain memory
693 from the system.
694
695 Malloc_trim returns 1 if it actually released any memory, else 0.
696 On systems that do not support "negative sbrks", it will always
697 return 0.
Run Code Online (Sandbox Code Playgroud)
glibc 中的实际实现是__malloc_trim,它有迭代 arenas 的代码:
http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#4552
4552 int
4553 __malloc_trim (size_t s)
4560 mstate ar_ptr = &main_arena;
4561 do
4562 {
4563 (void) mutex_lock (&ar_ptr->mutex);
4564 result |= mtrim (ar_ptr, s);
4565 (void) mutex_unlock (&ar_ptr->mutex);
4566
4567 ar_ptr = ar_ptr->next;
4568 }
4569 while (ar_ptr != &main_arena);
Run Code Online (Sandbox Code Playgroud)
每个 arena 都使用mtrim()( mTRIm()) 函数进行修剪,该函数调用malloc_consolidate()将所有空闲段从 fastbins(它们不会自由合并,因为速度很快)转换为普通空闲块(与相邻块合并)
4498 /* Ensure initialization/consolidation */
4499 malloc_consolidate (av);
4111 malloc_consolidate is a specialized version of free() that tears
4112 down chunks held in fastbins.
1581 Fastbins
1591 Chunks in fastbins keep their inuse bit set, so they cannot
1592 be consolidated with other free chunks. malloc_consolidate
1593 releases all chunks in fastbins and consolidates them with
1594 other free chunks.
Run Code Online (Sandbox Code Playgroud)
问题是,当重新创建工作线程时,它会创建一个新的 arena/heap,而不是重用前一个,这样以前的 arena/heap 中的 fastbin 就不会被重用。
这很奇怪。根据设计,glibc malloc 中的最大 arena 数量受到 cpu_core_count * 8 的限制(对于 64 位平台);cpu_core_count * 2(对于32位平台)或通过环境变量MALLOC_ARENA_MAX/mallopt参数M_ARENA_MAX。
您可以限制您的应用程序的竞技场数量;定期调用malloc_trim()或以“大”大小调用malloc()(它将调用malloc_consolidate),然后free()在销毁之前从线程中调用它:
3319 _int_malloc (mstate av, size_t bytes)
3368 if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
// fastbin allocation path
3405 if (in_smallbin_range (nb))
// smallbin path; malloc_consolidate may be called
3437 If this is a large request, consolidate fastbins before continuing.
3438 While it might look excessive to kill all fastbins before
3439 even seeing if there is space available, this avoids
3440 fragmentation problems normally associated with fastbins.
3441 Also, in practice, programs tend to have runs of either small or
3442 large requests, but less often mixtures, so consolidation is not
3443 invoked all that often in most programs. And the programs that
3444 it is called frequently in otherwise tend to fragment.
3445 */
3446
3447 else
3448 {
3449 idx = largebin_index (nb);
3450 if (have_fastchunks (av))
3451 malloc_consolidate (av);
3452 }
Run Code Online (Sandbox Code Playgroud)
malloc_trim PS: https://github.com/mkerrisk/man-pages/commit/a15b0e60b297e29c825b7417582a33e6ca26bf65的手册页中有评论:
+.SH NOTES
+This function only releases memory in the main arena.
+.\" malloc/malloc.c::mTRIm():
+.\" return result | (av == &main_arena ? sYSTRIm (pad, av) : 0);
Run Code Online (Sandbox Code Playgroud)
malloc_trim是的,有对 main_arena 的检查,但它是在实现的最后阶段mTRIm(),它只是用于sbrk()以负偏移量调用。自 2007 年(glibc 2.9 及更新版本)以来,有另一种方法可以将内存返回给操作系统:madvise(MADV_DONTNEED)该方法在所有领域都使用(并且 glibc 补丁的作者或手册页的作者没有记录)。每个领域都需要整合。还有用于修剪(munmapping) mmap 堆顶部块的代码(heap_trim/shrink_heap从慢速路径 free() 调用),但它不是从malloc_trim.