具有 EBS 卷的 Amazon EC2 MySQL 实例上的高 iowait

Question

具有 EBS 卷的 Amazon EC2 MySQL 实例上的高 iowait

Lui*_*len 5 mysql iowait amazon-ec2 raid0 amazon-ebs

我们有一个运行在 Amazon EC2 c1.medium 实例上的 MySQL 服务器，它依赖于使用 ext3 文件系统进行存储的单个 EBS 卷。

这个 MySQL 服务器被一些运行在一些 web 服务器上的应用程序以大约 500/ps 的速度查询，这些服务器也在 Amazon EC2 上。

正如您在下面看到的，服务器的平均负载和处理器空闲时间似乎很好，但现在有一些令人不安和担心的事情，这就是它一直在经历的高 iowait。

另一个让我非常担心的数字是 iostat 的每秒传输数 (tps)，大部分时间都保持在 450 以上。在对这个话题做了一些研究之后，我看到有人说这是对 EBS 卷的要求太多：https : //forums.aws.amazon.com/thread.jspa? threadID =30769

顺便说一下，您将在下面看到的命令输出不是在高峰时间捕获的。这就是服务器大部分时间的行为方式。

好吧，大家都说，这是我的问题：

1- 是时候考虑迁移到 RAID 架构（我会说 RAID 0）了吗？

2- 我应该花时间在集群解决方案上吗，比如 MySQL Cluster？

3- 您认为这种情况是否严重影响了我们的应用程序？如果我们转向 RAID 0 和/或集群解决方案，它们的性能会更好吗？（到目前为止，应用程序似乎很高兴，但他们会更高兴吗？）

如果您需要任何进一步的信息，请告诉我。

~ # uptime 
 12:34:14 up 2 days,  4:06,  1 user,  load average: 2.24, 1.90, **1.84**

########################################################

~ # vmstat 5

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------

 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id **wa** st

 0  1     52  11168  16420 1498728    0    0  4586   231   11   81  6  3 52 39  0

 2  1     52  10460  16320 1499588    0    0 11631   397 3194 4319 10  4 47 39  0

 4  1     52  11448  16064 1499156    0    0 12231   592 2301 3331  9  5 50 36  0

 4  0     52  10328  16068 1500176    0    0  8578   392 2131 2745  8  6 49 37  0

 0  1     52  11164  15732 1499928    0    0  9604   578 2609 3510  7  4 49 40  0

 0  1     52  10824  15768 1499836    0    0  5038   634 1912 2509  8  3 47 42  0

 3  1     52  12040  15888 1498096    0    0  5068   204 1927 2531 10  8 45 37  0

 8  2     52  11252  15784 1499272    0    0  8521   390 2437 3100 14 15 39 31  0

 1  2     52  11436  15724 1499748    0    0  8287   401 2159 3113 11 10 42 36  1

 0  1     52  12016  15704 1498752    0    0 11576   499 3324 3984 16 17 31 36  0

 1  1     52  10536  15664 1500508    0    0  8430   718 2686 3265 15 14 37 34  0

 1  1     52  10300  15676 1500744    0    0 10186   720 2488 3488 16  5 45 34  0

########################################################

~ # iostat -dm 5 /dev/sdf 
Linux 2.6.21.7-2.fc8xen (database-new)  01/20/12

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

sdf             464.81         8.84         0.33    1658860      61390

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

sdf             402.20         7.39         0.43         36          2

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

sdf             431.40         7.74         0.32         38          1

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

sdf             461.40         8.26         0.39         41          1

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

sdf             475.65         9.20         0.29         46          1

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

sdf             534.80         9.82         0.52         49          2

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn

sdf             526.60         9.97         0.52         49          2

########################################################

~ # iostat -mdx 5 /dev/sdf 

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

sdf              22.21    46.28 427.47 37.54     8.84     0.33    40.38     1.78    3.82   1.72  79.87

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

sdf              22.36    80.04 450.30 60.48     9.29     0.55    39.44     1.45    2.85   1.58  80.48

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

sdf              23.40    43.60 370.60 47.00     7.75     0.35    39.76     1.45    3.47   1.97  82.08

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

sdf              20.20    33.20 382.60 29.60     8.02     0.25    41.05     1.31    3.17   2.11  87.12

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

sdf              28.80    35.20 422.40 33.40     9.04     0.27    41.80     1.45    3.19   1.95  88.96

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

sdf              14.20    45.00 291.80 51.40     5.97     0.38    37.86     1.45    4.22   2.50  85.68

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

sdf              19.16    56.89 535.33 41.32    11.44     0.38    42.00     1.49    2.59   1.53  88.46

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util

sdf              20.40    81.40 233.00 64.40     4.86     0.57    37.39     1.74    5.84   3.18  94.72

Run Code Online (Sandbox Code Playgroud) ###############################################我的.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
long_query_time=1
key_buffer = 64M
thread_cache_size = 30
table_cache = 1024
table_definition_cache = 512
query_cache_type = 1
query_cache_size = 64M
tmp_table_size = 64M
max_heap_table_size = 64M
innodb_buffer_pool_size = 512M
old_passwords=1
max_connections=400
wait_timeout=30

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

[ndbd]
connect-string="nodeid=2;host=localhost:1186"

[ndb_mgm]
connect-string="host=localhost:1186"

Run Code Online (Sandbox Code Playgroud) ###############################################杂项的调优脚本输出

~ # ./tuning-primer.sh 

    -- MYSQL PERFORMANCE TUNING PRIMER --
         - By: Matthew Montgomery -

MySQL Version 5.1.52 i686

Uptime = 0 days 1 hrs 1 min 1 sec
Avg. qps = 517
Total Questions = 1894942
Threads Connected = 94

Warning: Server has not been running for at least 48hrs.
It may not be safe to use these recommendations

To find out more information on how each of these
runtime variables effects performance visit:
http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html
Visit http://www.mysql.com/products/enterprise/advisors.html
for info about MySQL's Enterprise Monitoring and Advisory Service

SLOW QUERIES
The slow query log is NOT enabled.
Current long_query_time = 1.000000 sec.
You have 207 out of 1894981 that take longer than 1.000000 sec. to complete
Your long_query_time seems to be fine

BINARY UPDATE LOG
The binary update log is NOT enabled.
You will not be able to do point in time recovery
See http://dev.mysql.com/doc/refman/5.1/en/point-in-time-recovery.html

WORKER THREADS
Current thread_cache_size = 30
Current threads_cached = 8
Current threads_per_sec = 0
Historic threads_per_sec = 0
Your thread_cache_size is fine

MAX CONNECTIONS
Current max_connections = 400
Current threads_connected = 93
Historic max_used_connections = 195
The number of used connections is 48% of the configured maximum.
Your max_connections variable seems to be fine.

INNODB STATUS
Current InnoDB index space = 1.33 G
Current InnoDB data space = 5.04 G
Current InnoDB buffer pool free = 0 %
Current innodb_buffer_pool_size = 512 M
Depending on how much space your innodb indexes take up it may be safe
to increase this value to up to 2 / 3 of total system memory

MEMORY USAGE
Max Memory Ever Allocated : 1.13 G
Configured Max Per-thread Buffers : 1.04 G
Configured Max Global Buffers : 642 M
Configured Max Memory Limit : 1.67 G
Physical Memory : 1.70 G

Max memory limit exceeds 90% of physical memory

KEY BUFFER
Current MyISAM index space = 379 M
Current key_buffer_size = 64 M
Key cache miss rate is 1 : 162
Key buffer free ratio = 80 %
Your key_buffer_size seems to be fine

QUERY CACHE
Query cache is enabled
Current query_cache_size = 64 M
Current query_cache_used = 43 M
Current query_cache_limit = 1 M
Current Query cache Memory fill ratio = 67.44 %
Current query_cache_min_res_unit = 4 K
MySQL won't cache query results that are larger than query_cache_limit in size

SORT OPERATIONS
Current sort_buffer_size = 2 M
Current read_rnd_buffer_size = 256 K
Sort buffer seems to be fine

JOINS
Current join_buffer_size = 132.00 K
You have had 4013 queries where a join could not use an index properly
You should enable "log-queries-not-using-indexes"
Then look for non indexed joins in the slow query log.
If you are unable to optimize your queries you may want to increase your
join_buffer_size to accommodate larger joins in one pass.

Note! This script will still suggest raising the join_buffer_size when
ANY joins not using indexes are found.

OPEN FILES LIMIT
Current open_files_limit = 2458 files
The open_files_limit should typically be set to at least 2x-3x
that of table_cache if you have heavy MyISAM usage.
Your open_files_limit value seems to be fine

TABLE CACHE
Current table_open_cache = 1024 tables
Current table_definition_cache = 512 tables
You have a total of 45237 tables
You have 1024 open tables.
Current table_cache hit rate is 0%
, while 100% of your table cache is in use
You should probably increase your table_cache
You should probably increase your table_definition_cache value.

TEMP TABLES
Current max_heap_table_size = 64 M
Current tmp_table_size = 64 M
Of 38723 temp tables, 44% were created on disk
Perhaps you should increase your tmp_table_size and/or max_heap_table_size
to reduce the number of disk-based temporary tables
Note! BLOB and TEXT columns are not allow in memory tables.
If you are using these columns raising these values might not impact your 
ratio of on disk temp tables.

TABLE SCANS
Current read_buffer_size = 128 K
Current table scan ratio = 537 : 1
read_buffer_size seems to be fine

TABLE LOCKING
Current Lock Wait ratio = 1 : 954
You may benefit from selective use of InnoDB.
If you have long running SELECT's against MyISAM tables and perform
frequent updates consider setting 'low_priority_updates=1'
If you have a high concurrency of inserts on Dynamic row-length tables
consider setting 'concurrent_insert=2'.

Run Code Online (Sandbox Code Playgroud)

Answer 1

Aar*_*own 12

如果您发布了 my.cnf 以及您是否使用 InnoDB 或 MyISAM 表以及您是否是读取密集型或写入密集型工作负载，这将有所帮助。否则，我们只是猜测。这是我的：

首先，我会查看并确保您的查询已正确编入索引。MySQL 数据库上的高 I/O 要么是由极高的并发性、调优的服务器引起的，要么是由于必须执行全表或索引扫描的查询性能不佳。关于如何查找性能不佳的查询的一些提示可以在我在 Ideeli 的技术博客上的帖子中找到。

检查您的 my.cnf。如果您使用 InnoDB，请确保 innodb_buffer_pool_size 和 innodb_log_file_size 足够大。由于 EBS 具有如此多变的延迟，最大化 innodb_log_file_size 可以带来显着的性能优势。如果您正在使用 MyISAM（并且您不应该使用），请确保您的 key_buffer 大小足够大。

如果您确信您的查询得到了很好的优化，并且您的服务器也得到了很好的调整，我们可以继续进行下一项。ext3 不太适合数据库。造成这种情况的主要原因之一是 ext3 一次只允许一个线程更新一个 inode（试图找到有关此的文档）。如果您没有使用 innodb-file-per-table 运行，这意味着 ibdata 文件上存在大量文件系统争用。xfs 没有这个限制，并且已被证明在数据库工作负载方面表现更好（需要源代码）。

如果您无法更改为 xfs，请确保您使用的是 innodb-file-per-table，并且至少要确保您的挂载上有 noatime,nodiratime。

接下来，关于您的实例大小。除非数据集很小，否则 c1.medium 不是大多数数据库的理想实例大小。MySQL 通常会受益于内存而不是计算能力。c1.medium 只有 1.7GB 的内存！你的数据集有多大？一般来说，除非在极少数情况下，m1.large（具有 7.5GB 的 RAM）的性能将优于 c1.medium。它也是贵两倍，每小时 0.34 美元。

现在进入 EBS 卷的 RAID。是的，RAID 会大大增加您的 IOPS。（与增加实例大小一样）。 不要RAID0 ...如果你关心你的数据，至少。我已经在很多地方解释了这一点，包括在我的博客上，作为2011 年 Percona Live NYC的演讲者，以及这里的 serverfault。简短的版本是 EBS 卷以非典型方式失败，并且能够从集合中删除卷已被证明在一些场合很有价值，尤其是在 2011 年的 EBS 大停电期间，一些站点离线了几天......尽管有数十个实例受到 EBS 问题的影响，但我们在凌晨 4 点离线 45 分钟。

以下是使用 MySQL 的 RAIDed EBS 卷的一些基准测试。

最后，Percona Server有大量的可扩展性优化。这是一份关于我公司从 MySQL 切换到 Percona Server 的经验的白皮书。我们每天都在经历数据库停滞和中断。由于许多可扩展性改进，只需从 MySQL 切换到 Percona Server 就可以在一夜之间解决该问题。

所以，总而言之...

调整您的查询
调整您的服务器
让自己变得更好的“硬件”
使用 xfs，而不是 ext3
RAID10，不是 RAID0
从 MySQL 切换到 Percona 服务器

对于 MySQL Cluster，它与 MySQL 完全不同，通常不适合大多数 OLTP 应用程序。 Galera / Percona XtraDB Cluster也是新的有趣的集群产品。然而，在你开始之前，你有很多选择。我们在 EC2 中使用带有 RAID10 的单个 m2.4xlarge 在峰值时提供 24k qps。

祝你好运！

归档时间：	14 年前
查看次数：	6578 次
最近记录：	14 年前