Mil*_*loš 10 mysql ubuntu mysql-5.7
我们有MySQL 5.7 master - slave复制,而在slave服务器端,我们的应用程序监控工具(Tideways和PHP7.0)不时报告
MySQL已经消失了.
检查MYSQL方面:
show global status like '%Connection%';
+-----------------------------------+----------+
| Variable_name | Value |
+-----------------------------------+----------+
| Connection_errors_accept | 0 |
| Connection_errors_internal | 0 |
| Connection_errors_max_connections | 0 |
| Connection_errors_peer_address | 323 |
| Connection_errors_select | 0 |
| Connection_errors_tcpwrap | 0 |
| Connections | 55210496 |
| Max_used_connections | 387 |
| Slave_connections | 0 |
+-----------------------------------+----------+
Run Code Online (Sandbox Code Playgroud)
该Connection_errors_peer_address节目323.如何在什么是对双方造成这一问题的进一步探讨:
MySQL已经消失了
和
Connection_errors_peer_address
编辑:
主服务器
net_retry_count = 10
net_read_timeout = 120
net_write_timeout = 120
skip_networking = OFF
Aborted_clients = 151650
Run Code Online (Sandbox Code Playgroud)
从服务器1
net_retry_count = 10
net_read_timeout = 30
net_write_timeout = 60
skip_networking = OFF
Aborted_clients = 3
Run Code Online (Sandbox Code Playgroud)
从属服务器2
net_retry_count = 10
net_read_timeout = 30
net_write_timeout = 60
skip_networking = OFF
Aborted_clients = 3
Run Code Online (Sandbox Code Playgroud)
在MySQL 5.7中,当新的TCP/IP连接到达服务器时,服务器执行几个检查,sql/sql_connect.cc在函数中实现check_connection()
其中一项检查是获取客户端连接的IP地址,如下所示:
static int check_connection(THD *thd)
{
...
if (!thd->m_main_security_ctx.host().length) // If TCP/IP connection
{
...
peer_rc= vio_peer_addr(net->vio, ip, &thd->peer_port, NI_MAXHOST);
if (peer_rc)
{
/*
Since we can not even get the peer IP address,
there is nothing to show in the host_cache,
so increment the global status variable for peer address errors.
*/
connection_errors_peer_addr++;
my_error(ER_BAD_HOST_ERROR, MYF(0));
return 1;
}
...
}
Run Code Online (Sandbox Code Playgroud)
失败时,状态变量connection_errors_peer_addr递增,连接被拒绝.
vio_peer_addr()实现在vio/viosocket.c(代码简化,只显示重要的调用)
my_bool vio_peer_addr(Vio *vio, char *ip_buffer, uint16 *port,
size_t ip_buffer_size)
{
if (vio->localhost)
{
...
}
else
{
/* Get sockaddr by socked fd. */
err_code= mysql_socket_getpeername(vio->mysql_socket, addr, &addr_length);
if (err_code)
{
DBUG_PRINT("exit", ("getpeername() gave error: %d", socket_errno));
DBUG_RETURN(TRUE);
}
/* Normalize IP address. */
vio_get_normalized_ip(addr, addr_length,
(struct sockaddr *) &vio->remote, &vio->addrLen);
/* Get IP address & port number. */
err_code= vio_getnameinfo((struct sockaddr *) &vio->remote,
ip_buffer, ip_buffer_size,
port_buffer, NI_MAXSERV,
NI_NUMERICHOST | NI_NUMERICSERV);
if (err_code)
{
DBUG_PRINT("exit", ("getnameinfo() gave error: %s",
gai_strerror(err_code)));
DBUG_RETURN(TRUE);
}
...
}
...
}
Run Code Online (Sandbox Code Playgroud)
简而言之,唯一的失败路径vio_peer_addr()发生在呼叫mysql_socket_getpeername()或vio_getnameinfo()失败时.
mysql_socket_getpeername()只是getpeername()之上的包装器.
该man 2 getpeername手册列出了以下可能的错误:
名称
Run Code Online (Sandbox Code Playgroud)getpeername - get name of connected peer socket错误
Run Code Online (Sandbox Code Playgroud)EBADF The argument sockfd is not a valid descriptor. EFAULT The addr argument points to memory not in a valid part of the process address space. EINVAL addrlen is invalid (e.g., is negative). ENOBUFS Insufficient resources were available in the system to perform the operation. ENOTCONN The socket is not connected. ENOTSOCK The argument sockfd is a file, not a socket.
在这些错误中,只是ENOBUFS看似合理.
至于vio_getnameinfo()它,它只是getnameinfo()的一个包装器man 3 getnameinfo,由于以下原因,它也会根据手册页失败:
名称
Run Code Online (Sandbox Code Playgroud)getnameinfo - address-to-name translation in protocol-independent manner返回值
Run Code Online (Sandbox Code Playgroud)EAI_AGAIN The name could not be resolved at this time. Try again later. EAI_BADFLAGS The flags argument has an invalid value. EAI_FAIL A nonrecoverable error occurred. EAI_FAMILY The address family was not recognized, or the address length was invalid for the specified family. EAI_MEMORY Out of memory. EAI_NONAME The name does not resolve for the supplied arguments. NI_NAMEREQD is set and the host's name cannot be located, or neither请求了主机名和服务名.
Run Code Online (Sandbox Code Playgroud)EAI_OVERFLOW The buffer pointed to by host or serv was too small. EAI_SYSTEM A system error occurred. The error code can be found in errno. The gai_strerror(3) function translates these error codes to a human readable string, suitable for error reporting.
在这里可能会发生许多故障,主要是由于负载过重或网络故障.
要理解此代码背后的过程,MySQL服务器本质上正在做的是反向DNS查找,以便:
总体而言,故障Connection_errors_peer_address可能是由于系统负载(导致内存不足等瞬态故障)或由于影响DNS的网络问题.
披露:我碰巧是Connection_errors_peer_address在MySQL中实现此状态变量的人,作为在代码的这个区域中具有更好的可见性/可观察性的努力的一部分.
[编辑]要跟进更多细节和/或指南:
Connection_errors_peer_address递增,根本原因不会打印在日志中.这对于故障排除来说是不幸的,但也避免洪水淹没日志造成更大的破坏,这里有一个权衡.请记住,登录前发生的任何事情都非常敏感......mysqld并监视uptime它,应该相当容易确定故障是否"仅"导致连接在服务器熬夜时关闭,或者服务器本身是否发生灾难性故障.getnameinfo.skip-name-resolve将无效,因为此检查稍后发生(请参阅specialflag & SPECIAL_NO_RESOLVE代码中check_connection())Connection_errors_peer_address失败,请注意服务器干净地将错误返回ER_BAD_HOST_ERROR给客户端,然后关闭套接字.这与突然关闭套接字(如崩溃)不同:前者应由客户端报告"Can't get hostname for your address",而后者报告为"MySQL has gone away".ER_BAD_HOST_ERROR以及套接字关闭是另一个故事鉴于此失败总体上似乎与DNS查找有关,我会检查以下项目:
performance_schema.host_cache表中有多少行.host_cache_size系统变量.表performance_schema.host_cache文档:
https://dev.mysql.com/doc/refman/5.7/en/host-cache-table.html
进一步阅读:
http://marcalff.blogspot.com/2012/04/performance-schema-nailing-host-cache.html
[编辑2]根据可用的新数据:
该Aborted_clients状态变量显示了一些连接服务器强行关闭.这通常在会话空闲很长时间时发生.
这种情况的典型情况是:
请注意,忘记干净地关闭会话的客户端应用程序将执行1-3,这可能是主服务器上的Aborted_clients的情况.使用主服务器来修复客户端应用程序的一些清理将有助于减少资源消耗,因为在超时时将151650个会话打开以消耗成本.
执行1-4的客户端应用程序可能导致服务器上的Aborted_clients,并且 MySQL已经在客户端上消失.报告"MySQL已经消失"的客户端应用程序很可能是这里的罪魁祸首.
如果监控应用,比方说,检查每N秒的服务器,然后确保超时(这里30和60秒)有显著大于N,或者服务器将杀死监视会话.
| 归档时间: |
|
| 查看次数: |
432 次 |
| 最近记录: |