如何重新同步 AWS RDS 只读副本

Question

如何重新同步 AWS RDS 只读副本

Jam*_*mes 6 amazon-web-services amazon-rds read-replication

有没有办法修复已停止与主数据库同步的只读副本？我已经在删除它并创建一个新的过程中，因为我找不到这个答案，但很高兴知道它是否再次发生。

数据库是带有 Innodb 表的 MySQL 数据库。

Answer 1

MySQL复制背后的原理很简单：如果你从两个相同的数据集开始，每次更改一个数据集时，都会更改另一个数据集，那么这两个数据集将保持相同。这就是 MySQL 复制的工作原理——您从两台相同的服务器开始，要么完全空白，要么其中一台是另一台的精确快照，复制只是在两台服务器上执行相同的操作。

Replication is done via the binary log ("binlog"), which captures all changes to the master. In standard MySQL asynchronous replication -- as used in RDS -- the replica has two purpose-specific threads, the I/O thread that connects to the master and captures the replication events from the master's binlog and writes them to a temporary holding area called the relay log, and the SQL thread that reads from the relay log and applies the changes to the replica.

On the replica, the query SHOW SLAVE STATUS; will tell you whether these two threads are running, or not. If they are running, the replica is healthy, though it might be behind the master, as evidenced by the value Seconds_Behind_Master that you'll also find in the output from that query. Otherwise, you'll find the error that has been encountered, causing one or the other threads to stop.

In theory, a MySQL replica will never go out of sync unless one of three things happens:

you do something you shouldn't, to make the replica inconsistent with the master -- such as making the replica writable, and writing to it.
there's a bug in the MySQL source code that causes inconsistency
the replica is disconnected from the master for a sufficiently long period of time such that the master has already discarded some of the replication events that the replica has never seen.

The first issue will cause the SQL thread to halt because it tries to apply a nonsense change -- typically deleting a row that doesn't exist, updating a row that doesn't exist or doesn't match, inserting a row that's already present, etc.

The second issue could cause a problem with either the IO thread or the SQL thread but these should be rare.

The last issue will cause the IO thread to halt because it remembers where it left off on the master, and if no binary log file is available on the master at that point, it is at an impasse. RDS is supposed to prevent this by holding logs on the master until all managed replicas have captured them.

So, the general answer is that you can fix a MySQL read replica by bringing all of its data into exactly the state that it should be in, based on the state of the master at the point in time where the replication SQL thread is currently pointing, in the relay logs.

That's a little bit trickier in RDS because you don't have the SUPER privilege, but it's still possible. Still...

tl;dr: broken replication is only a symptom -- you have to figure out what the actual problem is.

You need to be able to identify what's gone wrong, and take steps to correct it. The problem is, when replication stops, unless you have a very clear understanding of exactly what happened, you don't actually know just how bad things might be on the replica.

Thinking back to the principle mentioned above -- start with two identical data sets, and every time you change one, change the other -- the next thing to note is that MySQL does not have any built-in mechanisms for ensuring consistency in the absence of actual replication errors. Two servers can be dramatically divergent but replication will happily continue until the SQL thread encounters something that it cannot replicate. You need a third party utility that can compare the data on the two servers and call out any discrepancies.

If you clearly understand what went wrong, you can temporarily make the replica writable (using the parameter group setting for the read_only system variable), make the corrective changes, and restart replication. On RDS, you can only restart at the current event pointer by rebooting the replica, since you don't have the SUPER privilege, or you can bring the replica to the state it should have been in after the problematic event replicated, and then use the workaround they provide for that, using CALL mysql.rds_skip_repl_error();. Do not use this without understanding what it does -- specifically, it ignores the failure and moves on to the next event, absolutely leaving your replica in an inconsistent state unless you have manually brought the replica consistent. It should be reserved for emergencies only, when keeping the replica current is more important than keeping the replica correct, because skipping an error essentially guarantees more errors in the future.

修复副本并不是一件简单的事情。这是经验丰富的 DBA 的任务。在 RDS 中，最佳选择通常是丢弃副本并创建一个新副本，但由于复制错误永远不应该发生 - 这不是您应该做的事情。如果你这样做，你需要找出原因。

归档时间：	6 年，8 月前
查看次数：	9689 次
最近记录：	5 年前