tec*_*mic 6 performance authentication rhel
我在我们的其中一台生产服务器上遇到了问题,随着时间的推移,登录尝试变得越来越慢。大约 5 天后,它变得如此缓慢,以至于某些关键进程/cron 条目无法正确触发。
服务器信息:
硬件:Dell R720 24 GB 内存 2 个 Intel Xeon E502620 v2 处理器(总共 24 个内核,包括超线程) 8 个 300GB 10K SAS 驱动器
操作系统:红帽企业 Linux 6.5
我遇到了通过 SSH 登录的问题,并开始了漫长的调查“红鲱鱼”的道路。最终,我注意到即使执行以下操作也需要很长时间:
[someuser#hostname] su -
Password:
Run Code Online (Sandbox Code Playgroud)
当运行上面的 'su -' 命令时,它不应该以任何形式涉及 SSH,因为我只是想在盒子本身上进行身份验证,对吧?
这已经在同一个盒子上连续发生了 3 周,这是我第一次注意到(并且测试过我可能会添加)仅本地登录也很慢。
当我通过控制台登录时,它看起来如下:
hostname login: user
Password: # I enter the password and hit [enter]
Run Code Online (Sandbox Code Playgroud)
过了很久,然后……
[user@hostname ~] $
Run Code Online (Sandbox Code Playgroud)
当我通过 SSH 登录时,它看起来如下(详细并执行 localhost 登录):
[user@hostname ~]$ ssh -v root@localhost
OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug1: Connecting to localhost [127.0.0.1] port 22.
debug1: Connection established.
debug1: identity file /usr/local/user/.ssh/identity type -1
debug1: identity file /usr/local/user/.ssh/identity-cert type -1
debug1: identity file /usr/local/user/.ssh/id_rsa type -1
debug1: identity file /usr/local/user/.ssh/id_rsa-cert type -1
debug1: identity file /usr/local/user/.ssh/id_dsa type -1
debug1: identity file /usr/local/user/.ssh/id_dsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.3
debug1: match: OpenSSH_5.3 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: server->client aes128-ctr hmac-md5 none
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is 1d:50:5e:a3:e4:63:d6:1d:d8:2c:85:07:95:81:c8:b6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
debug1: ssh_rsa_verify: signature correct
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
debug1: Next authentication method: gssapi-keyex
debug1: No valid Key exchange context
debug1: Next authentication method: gssapi-with-mic
debug1: Unspecified GSS failure. Minor code may provide more information
Credentials cache file '/tmp/krb5cc_501' not found
debug1: Unspecified GSS failure. Minor code may provide more information
Credentials cache file '/tmp/krb5cc_501' not found
debug1: Unspecified GSS failure. Minor code may provide more information
debug1: Unspecified GSS failure. Minor code may provide more information
Credentials cache file '/tmp/krb5cc_501' not found
debug1: Next authentication method: publickey
debug1: Trying private key: /usr/local/user/.ssh/identity
debug1: Trying private key: /usr/local/user/.ssh/id_rsa
debug1: Trying private key: /usr/local/user/.ssh/id_dsa
debug1: Next authentication method: password
root@localhost's password:
debug1: Authentication succeeded (password).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
Run Code Online (Sandbox Code Playgroud)
在这一点上,它再次卡住了很长时间,并最终成功登录。
任何指针将不胜感激。这让我爬上墙。
以下内容也出现在 dmesg 输出中。它对各种进程名称重复(不仅是“cifsd”)
cifsd: page allocation failure. order:5, mode:0x20
Pid: 12913, comm: cifsd Not tainted 2.6.32-431.el6.x86_64 #1
Call Trace:
[<ffffffff8112f9e7>] ? __alloc_pages_nodemask+0x757/0x8d0
[<ffffffff8116e482>] ? kmem_getpages+0x62/0x170
[<ffffffff8116f09a>] ? fallback_alloc+0x1ba/0x270
[<ffffffff8116eaef>] ? cache_grow+0x2cf/0x320
[<ffffffff8116ee19>] ? ____cache_alloc_node+0x99/0x160
[<ffffffff8116ffe0>] ? kmem_cache_alloc_node_trace+0x90/0x200
[<ffffffff811701fd>] ? __kmalloc_node+0x4d/0x60
[<ffffffff8144feca>] ? __alloc_skb+0x7a/0x180
[<ffffffff81450fe0>] ? skb_copy+0x40/0xb0
[<ffffffffa014f57c>] ? tg3_start_xmit+0xa8c/0xd80 [tg3]
[<ffffffff81460354>] ? dev_hard_start_xmit+0x224/0x480
[<ffffffff8147bd0a>] ? sch_direct_xmit+0x15a/0x1c0
[<ffffffff81460858>] ? dev_queue_xmit+0x228/0x320
[<ffffffff8149a0d8>] ? ip_finish_output+0x148/0x310
[<ffffffff8149a358>] ? ip_output+0xb8/0xc0
[<ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0
[<ffffffff81499655>] ? ip_local_out+0x25/0x30
[<ffffffff81499b30>] ? ip_queue_xmit+0x190/0x420
[<ffffffff8112ff2f>] ? free_hot_page+0x2f/0x60
[<ffffffff814aee3e>] ? tcp_transmit_skb+0x40e/0x7b0
[<ffffffff814b1380>] ? tcp_write_xmit+0x230/0xa90
[<ffffffff814b1f00>] ? __tcp_push_pending_frames+0x30/0xe0
[<ffffffff814a9663>] ? tcp_data_snd_check+0x33/0x100
[<ffffffff814ad261>] ? tcp_rcv_established+0x381/0x7f0
[<ffffffff8152873a>] ? schedule_timeout+0x19a/0x2e0
[<ffffffff814b5643>] ? tcp_v4_do_rcv+0x2e3/0x490
[<ffffffff814a130a>] ? tcp_prequeue_process+0x7a/0xa0
[<ffffffff814a4a2c>] ? tcp_recvmsg+0xacc/0xe80
[<ffffffff814c58ca>] ? inet_recvmsg+0x5a/0x90
[<ffffffff8105a625>] ? select_idle_sibling+0x95/0x150
[<ffffffff81449ab3>] ? sock_recvmsg+0x133/0x160
[<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff81059216>] ? enqueue_task+0x66/0x80
[<ffffffff8105571d>] ? check_preempt_curr+0x6d/0x90
[<ffffffff81065c5e>] ? try_to_wake_up+0x24e/0x3e0
[<ffffffff81065e02>] ? default_wake_function+0x12/0x20
[<ffffffff8109b2b6>] ? autoremove_wake_function+0x16/0x40
[<ffffffff81449b24>] ? kernel_recvmsg+0x44/0x60
[<ffffffffa01fd7c9>] ? cifs_readv_from_socket+0x1a9/0x260 [cifs]
[<ffffffffa020b11d>] ? cifs_add_credits+0x5d/0x70 [cifs]
[<ffffffffa01fd8a7>] ? cifs_read_from_socket+0x27/0x30 [cifs]
[<ffffffffa01fda03>] ? cifs_demultiplex_thread+0x153/0xe10 [cifs]
[<ffffffff81065e02>] ? default_wake_function+0x12/0x20
[<ffffffffa01fd8b0>] ? cifs_demultiplex_thread+0x0/0xe10 [cifs]
[<ffffffff8109aef6>] ? kthread+0x96/0xa0
[<ffffffff8100c20a>] ? child_rip+0xa/0x20
[<ffffffff8109ae60>] ? kthread+0x0/0xa0
[<ffffffff8100c200>] ? child_rip+0x0/0x20
Run Code Online (Sandbox Code Playgroud)
小智 6
发现问题(感谢这篇文章/sf/ask/588978281/ Between-login-and-shell-prompt )
问题出在 /etc/profile.d/zzzz-vamilocale.sh 文件中,该文件试图从虚拟机属性中读取某些内容并陷入其中。删除此文件可以解决该问题。
我调试此问题的方法: 1. 以问题用户身份登录 2. 调用“bash --login --verbose” 3. 找出停止执行的行 4. 在 /etc/profile.d 的一个文件中找到这一行/
作为调查缓慢身份验证的一般方法,请检查/etc/pam.conf
和/etc/pam.d/su
(/etc/pam.d/sshd
等)以查看登录服务执行哪种身份验证。检查系统日志以查看是否记录了任何内容(查找从身份验证时起的日志条目)。
就您而言,内核日志揭示了问题。\xe2\x80\x9cpage Allocation failure\xe2\x80\x9d 消息表明您的系统虚拟内存不足。要么杀死一些程序,要么增加交换空间。
\n