Cla*_*kes 7 mysql mysql-5.5 percona-server
我用谷歌搜索并搜索了 stackexchange,但找不到与此特定问题相关的任何内容(还有其他错误 11 问题,就我所知,仅此而已)。
我们的服务器并没有真正进行大量的读取或写入。服务器是具有 16 个内核和 16GB 内存的 VM。
MySQL 在一段未知的时间后崩溃,错误日志显示:
09:17:14 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/
key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=21
max_threads=502
thread_count=12
connection_count=12
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1131343 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
hread pointer: 0x3329180
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f4a0043ee70 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x7cbffc]
/usr/sbin/mysqld(handle_fatal_signal+0x4a1)[0x699251]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f4a3278b340]
/lib/x86_64-linux-gnu/libc.so.6(+0x9818e)[0x7f4a3114b18e]
/usr/sbin/mysqld(_ZN8Protocol14net_store_dataEPKhm+0x47)[0x52e3b7]
/usr/sbin/mysqld[0x52fea4]
/usr/sbin/mysqld(_ZN8Protocol24send_result_set_metadataEP4ListI4ItemEj+0x52c)[0x52fbbc]
/usr/sbin/mysqld(_Z18mysqld_list_fieldsP3THDP10TABLE_LISTPKc+0x22a)[0x5e8d2a]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x2498)[0x5a6988]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x13a)[0x643a4a]
/usr/sbin/mysqld(handle_one_connection+0x4a)[0x643b3a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f4a32783182]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f49d00078a0): is an invalid pointer
Connection ID (thread ID): 83475
Status: NOT_KILLED
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
150629 9:17:14 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
150629 9:17:14 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
150629 9:17:14 [Note] Plugin 'FEDERATED' is disabled.
150629 9:17:14 InnoDB: The InnoDB memory heap is disabled
150629 9:17:14 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150629 9:17:14 InnoDB: Compressed tables use zlib 1.2.3
150629 9:17:14 InnoDB: Using Linux native AIO
150629 9:17:14 InnoDB: Initializing buffer pool, size = 500.0M
150629 9:17:14 InnoDB: Completed initialization of buffer pool
150629 9:17:14 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 15732651240
150629 9:17:14 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 15737893888
InnoDB: Doing recovery: scanned up to log sequence number 15743136768
InnoDB: Doing recovery: scanned up to log sequence number 15748379648
InnoDB: Doing recovery: scanned up to log sequence number 15752740649
InnoDB: Transaction 986C1E8 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 986C300
150629 9:17:15 InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 89114508, file name /var/lib/mysql/mysql-bin.000057
InnoDB: Starting in background the rollback of uncommitted transactions
150629 9:17:16 InnoDB: Rollback of non-prepared transactions completed
150629 9:17:16 InnoDB: Waiting for the background threads to start
150629 9:17:17 Percona XtraDB (http://www.percona.com) 5.5.43-37.2 started; log sequence number 15752740649
150629 9:17:17 [Note] Recovering after a crash using /var/lib/mysql/mysql-bin
150629 9:17:17 [Note] Starting crash recovery...
150629 9:17:17 InnoDB: Starting recovery for XA transactions...
150629 9:17:17 InnoDB: Transaction 986C1E8 in prepared state after recovery
150629 9:17:17 InnoDB: Transaction contains changes to 1 rows
150629 9:17:17 InnoDB: 1 transactions in prepared state after recovery
150629 9:17:17 [Note] Found 1 prepared transaction(s) in InnoDB
150629 9:17:17 [Note] Crash recovery finished.
150629 9:17:17 [Warning] 'user' entry 'root@host' ignored in --skip-name-resolve mode.
150629 9:17:17 [Warning] 'proxies_priv' entry '@ root@host' ignored in --skip-name-resolve mode.
150629 9:17:17 [Note] Event Scheduler: Loaded 0 events
150629 9:17:17 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.43-37.2-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona Server (GPL), Release 37.2, Revision 6202fee
Run Code Online (Sandbox Code Playgroud)
我不认为这与 Percona 相关,因为它也与常规 MySQL 一起使用(我加载了 Percona 以查看它是否会修复它)。
在此期间,general.log 仅显示以下内容:
150629 9:17:13 76946 Query set autocommit=1
150629 9:17:14 76946 Query UPDATE `api_dictionaryitem` SET `name` = '', `group_id` = 4, `count` = 2248844, `last_occurence` = NULL WHERE `api_dictionaryitem`.`id` = 8
150629 9:17:17 1 Connect debian-sys-maint@localhost on
150629 9:17:18 4 Quit
150629 9:17:22 36 Connect user@localhost on mydb
Run Code Online (Sandbox Code Playgroud)
整个数据库只有5.3MB
我花了好几个小时试图弄清楚这一点,但此时我已经超出了我的范围。
编辑:是的,我知道发生拼写错误,它在票务队列中进行修复,但现在在整个代码中都是一致的,所以这不是原因。
编辑 2:添加 MySQL 配置 - 这是使用 Percona 的生成工具生成的。我将它用作“安全”配置。
# Generated by Percona Configuration Wizard (http://tools.percona.com/) version REL5-20120208
[mysql]
# CLIENT #
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld]
# GENERAL #
user = mysql
default-storage-engine = InnoDB
socket = /var/run/mysqld/mysqld.sock
pid-file = /var/run/mysqld/mysqld.pid
# MyISAM #
key-buffer-size = 32M
myisam-recover = FORCE,BACKUP
# SAFETY #
max-allowed-packet = 16M
max-connect-errors = 1000000
skip-name-resolve
sql-mode = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY
sysdate-is-now = 1
innodb = FORCE
innodb-strict-mode = 1
# DATA STORAGE #
datadir = /var/lib/mysql/
# BINARY LOGGING #
log-bin = /var/lib/mysql/mysql-bin
expire-logs-days = 14
sync-binlog = 1
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 500
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 1024
table-open-cache = 2048
# INNODB #
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 64M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = 500M
# LOGGING #
log-error = /var/log/mysql/error.log
slow-query-log = 1
slow-query-log-file = /var/log/mysql/slow.log
log=/var/log/mysql/general.log
# Autoextend
innodb_data_file_path = ibdata1:128M;ibdata2:10M:autoextend
Run Code Online (Sandbox Code Playgroud)
Rol*_*DBA 11
引起我注意的是通用日志中记录的Query
150629 9:17:14 76946 Query UPDATE `api_dictionaryitem` SET `name` = '', `group_id` = 4, `count` = 2248844, `last_occurence` = NULL WHERE `api_dictionaryitem`.`id` = 8
Run Code Online (Sandbox Code Playgroud)
当你收到信号 11
09:17:14 UTC - mysqld got signal 11 ;
Run Code Online (Sandbox Code Playgroud)
我怀疑你api_dictionaryitem在这个实例中运行表时遇到了一些奇怪的错误。为什么 ?
看看你得到的信号 11 的响应,我将逐项列出
让我们看看 mysqld 在不得不求助于堆栈跟踪之前做了什么
stack_bottom = 7f4a0043ee70 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x7cbffc]
/usr/sbin/mysqld(handle_fatal_signal+0x4a1)[0x699251]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f4a3278b340]
/lib/x86_64-linux-gnu/libc.so.6(+0x9818e)[0x7f4a3114b18e]
/usr/sbin/mysqld(_ZN8Protocol14net_store_dataEPKhm+0x47)[0x52e3b7]
/usr/sbin/mysqld[0x52fea4]
/usr/sbin/mysqld(_ZN8Protocol24send_result_set_metadataEP4ListI4ItemEj+0x52c)[0x52fbbc]
/usr/sbin/mysqld(_Z18mysqld_list_fieldsP3THDP10TABLE_LISTPKc+0x22a)[0x5e8d2a]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x2498)[0x5a6988]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x13a)[0x643a4a]
/usr/sbin/mysqld(handle_one_connection+0x4a)[0x643b3a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f4a32783182]
Run Code Online (Sandbox Code Playgroud)
看那行说
/usr/sbin/mysqld(_ZN8Protocol14net_store_dataEPKhm+0x47)[0x52e3b7]
Run Code Online (Sandbox Code Playgroud)
撞击时的查询是
UPDATE `api_dictionaryitem` SET
`name` = '',
`group_id` = 4,
`count` = 2248844,
`last_occurence` = NULL
WHERE `api_dictionaryitem`.`id` = 8
Run Code Online (Sandbox Code Playgroud)
那一刻 net_store_data 在做什么?

数据字段是许多服务器响应数据包中的关键组成部分。数据字段由长度说明符序列后跟实际数据值组成。长度说明符序列可以通过学习net_store_length() ...
有趣的是 net_store_data() 在protocol.cc. 该代码看起来像这样
{
to=net_store_length_fast(to,length);
memcpy(to,from,length);
return to+length;
}
Run Code Online (Sandbox Code Playgroud)
在第 489 行,另一个版本的 net_store_data() 如下所示:
{
char buff[20];
uint length=(uint) (int10_to_str(from,buff,10)-buff);
to=net_store_length_fast(to,length);
memcpy(to,buff,length);
return to+length;
}
Run Code Online (Sandbox Code Playgroud)
在第 498 行,另一个版本的 net_store_data() 看起来像这样:
{
char buff[22];
uint length=(uint) (longlong10_to_str(from,buff,10)-buff);
to=net_store_length_fast(to,length);
memcpy(to,buff,length);
return to+length;
}
Run Code Online (Sandbox Code Playgroud)
这些代码片段可以在SourceCodeBrowser 中找到
net_store_length 的代码
00450 {
00451 uchar *packet=(uchar*) pkg;
00452 if (length < 251)
00453 {
00454 *packet=(uchar) length;
00455 return (char*) packet+1;
00456 }
00457 *packet++=252;
00458 int2store(packet,(uint) length);
00459 return (char*) packet+2;
00460 }
Run Code Online (Sandbox Code Playgroud)
可以在这里找到
由于 net_store_data 是堆栈跟踪中的单点故障,那么触发此类故障的 SQL 是什么?
回头看SQL
UPDATE `api_dictionaryitem` SET
`name` = '',
`group_id` = 4,
`count` = 2248844,
`last_occurence` = NULL
WHERE `api_dictionaryitem`.`id` = 8
Run Code Online (Sandbox Code Playgroud)
唯一让我觉得格格不入的是
`last_occurence` = NULL
Run Code Online (Sandbox Code Playgroud)
回头看一下net_store_length的代码。注意行
00452 if (length < 251)
Run Code Online (Sandbox Code Playgroud)
数字 251 有什么特别之处?根据《Understanding MySQL Internals》一书,第 74 页的最后一段说:
可能有人会问,当第一个保留值
net_store_length()是252时,为什么1字节长度限制为251。代码251有特殊含义。表示代码后面没有长度值或数据,字段值为SQL NULL。
啊哈!!!处理last_occurence字段时,在 net_store_length() 中错误处理了 NULL 值。如果last_occurence是 DATETIME 或 TIMESTAMP,则很可能不允许使用 NULL。
也许改变
`last_occurence` = NULL
Run Code Online (Sandbox Code Playgroud)
到虚假但合法的日期值
`last_occurence` = '1970-01-01 00:00:00'
Run Code Online (Sandbox Code Playgroud)
可能会直接解决这个问题。
堆栈跟踪的第一个反应是正确的: This could be because you hit a bug.
之前你评论过表的定义
mysql> desc api_dictionaryitem;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| count | bigint(20) | YES | | NULL | |
| last_occurence | bigint(20) | YES | | NULL | |
| group_id | int(11) | NO | MUL | NULL | |
+----------------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
你提到的last_occurence是BIGINT。既然是BIGINT,也就是 8 个字节,而且它被设置为NULL,想想它做了什么。
第一个字节有 0xFB (251),后跟 7 个字节的未定义内容。net_store_length我在我的回答中向你展示的代码有一些从第 458 行开始的整数
00450 {
00451 uchar *packet=(uchar*) pkg;
00452 if (length < 251)
00453 {
00454 *packet=(uchar) length;
00455 return (char*) packet+1;
00456 }
00457 *packet++=252;
00458 int2store(packet,(uint) length);
00459 return (char*) packet+2;
00460 }
Run Code Online (Sandbox Code Playgroud)
该函数被调用int2store。我认为它不能正确处理 BIGINT,因为 NULL 标志后的 7 个字节可能是任何东西。请记住,信号 11 是分段错误。在这种情况下,这是由于未知或虚假的字节过程。