MySQL 占用内存

syn*_*-dj 6 mysql innodb

在虚拟化的 Ubuntu 12.04 上安装 MySQL 5.6.10 会出现大量内存占用。mysqld 进程在正常运行时间的几个小时内声明所有可用内存并强制主机交换:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
16229 mysql     20   0 26.8g  21g 8736 S   42 93.4  37:23.22 mysqld
Run Code Online (Sandbox Code Playgroud)

它曾经增长到 50 GB,因此大大超过了数据集本身:

Current InnoDB index space = 5.25 G
Current InnoDB data space = 23.07 G
Run Code Online (Sandbox Code Playgroud)

通常,我可以通过发布 释放 ~ 3 GB FLUSH TABLES,尽管它比kill -9mysql 进程快得多,让它重新启动并为 InnoDB 运行恢复。使用的表几乎完全是 InnoDB,innodb_buffer_pool_size 已设置为 5 GB(将其设置为 16 GB 后会迅速耗尽可用物理内存并换出超过 18 GB)。

当系统正在交换时,我可以观察到相当高的“换出”计数器数字(vmstat 在突发期间显示约 1k 页/秒)并且几乎没有任何东西换回(每分钟几十页)。我首先怀疑内存泄漏,但到目前为止还没有发现任何支持这个假设的东西。

SHOW INNODB STATUS 表示缓冲池仅部分填充:

----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 5365825536; in additional pool allocated 0
Dictionary memory allocated 2558496
Buffer pool size   320000
Free buffers       173229
Database pages     142239
Old database pages 52663
Modified db pages  344
Pending reads 1
Pending writes: LRU 0, flush list 1 single page 0
Pages made young 34, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 141851, created 387, written 41126
81.16 reads/s, 0.00 creates/s, 0.39 writes/s
Buffer pool hit rate 998 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 142239, unzip_LRU len: 0
I/O sum[0]:cur[464], unzip sum[0]:cur[0]
Run Code Online (Sandbox Code Playgroud)

服务器总共有 80-90 个连接,其中大部分被 SHOW PROCESSLIST 报告为处于“睡眠”状态。

内存敏感选项集是

max_allowed_packet      = 16M
thread_stack            = 192K
thread_cache_size       = 8
max_connections         = 1000

innodb_file_format      = Barracuda
innodb_buffer_pool_size = 5000M
innodb_log_file_size    = 256M
innodb_flush_method     = O_DIRECT

query_cache_limit       = 1M
query_cache_size        = 256M

join_buffer_size        = 256k
tmp_table_size          = 2M
max_heap_table_size     = 64M
Run Code Online (Sandbox Code Playgroud)

Tuning-primer.sh 脚本计算内存使用的合理值:

MEMORY USAGE
Max Memory Ever Allocated : 5.27 G
Configured Max Per-thread Buffers : 1.92 G
Configured Max Global Buffers : 5.15 G
Configured Max Memory Limit : 7.07 G
Physical Memory : 22.98 G
Max memory limit seem to be within acceptable norms
Run Code Online (Sandbox Code Playgroud)

Binlog is enabled and the host has a replication slave attached to it (although results were not all that different at the time this has not been the case). Innodb_file_per_table is enabled by default in 5.6 and the databases are hosting a total of ~ 1,300 tables.

What means do I have to identify the possible causes for the apparently unlimited growth?

After reading "How MySQL uses memory" I had the suspicion that temporary tables might be the culprit. If they are not being released correctly for whatever reason, they could accumulate pretty quickly. The application querying the database issues a lot of nested, complicated queries, so temporary tables would be heavily in use according to the referenced docs. I tried checking if killing / resetting existing (idle) connections would significantly reduce memory usage when mysqld has reached ~20 GB - it would not, so this is either not related to connection states or the memory is leaking from there in a way which would be unaffected by closing the connection.

How would I verify if in-memory temporary tables are occupying a significant amount of memory? The STATUS variables and the INFORMATION_SCHEMA do not seem to have this information.

MySQL's memory usage appears hard to debug - the counters available seem not to account for the larger part of the usage I am seeing. I might be missing something, though.

I also have a MyISAM-based replication slave attached to the InnoDB master taking similar (read-only) loads - it does not show any signs of excessive memory usage (mysqld RSS is continuously < 1GB) , so the problem appears to be specific to the InnoDB configuration.

syn*_*-dj 1

使用当前 GA 版本 MySQL 5.6.19 进行测试表明该问题已消失。

阅读中间版本的发行说明,我注意到5.6.14中的这个特殊说明,我怀疑它与我们的问题有关,因为它明确引用了在我们的例子中过度使用的子查询子句:

对于某些语句,当优化器删除不需要的子查询子句时,可能会导致内存泄漏。(错误#16807641)

参考文献:此错误是错误#15875919 的回归。

不幸的是,Oracle 没有发布错误,因此这是我们可以获得的有关该问题的所有技术细节。