在 NFS 挂载的“触摸”操作期间挂起

Thi*_*his 5 filesystems nfs files

我有两个 NFS 客户端安装到192.0.2.3的openfiler 2.99 NFS 共享:

  • 192.0.2.1 安装192.0.2.3:/mnt/nfs01/volnfs01/share01rw,noatime,nodiratime,hard,rsize=32768,wsize=32768,noacl,nocto,tcp,nfsvers=3
  • 192.0.2.1 安装192.0.2.3:/mnt/nfs01/volnfs01/share02rw,noatime,nodiratime,hard,rsize=32768,wsize=32768,nfsvers=3,tcp,noacl,nocto
  • 192.0.2.2 安装192.0.2.3:/mnt/nfs01/volnfs01/share02rw,noatime,nodiratime,hard,rsize=32768,wsize=32768,nfsvers=3,tcp,noacl,nocto

touch 破碎的

我的问题是 192.0.2.2 的 NFS 挂载。当我在该挂载上触摸一个文件时,该过程会无限期挂起......我使用strace touch /mnt/share02/this并走到了这一步......

rt_sigaction(SIGRTMIN, {0x3b71c05ae0, [], SA_RESTORER|SA_SIGINFO, 0x3b71c0f500}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x3b71c05b70, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3b71c0f500}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
brk(0)                                  = 0xafb000
brk(0xb1c000)                           = 0xb1c000
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158576, ...}) = 0
mmap(NULL, 99158576, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fce244c0000
close(3)                                = 0
open("/mnt/share02/this", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666
                                                                    ^^^ stops touching
                                                                     |
                                                                     |
Run Code Online (Sandbox Code Playgroud)

当我ps -elf从另一个终端检查时,我看到进程处于“D”状态......

[mpenning@host192_0_2_2 ~]$ ps -elf | awk '$2=="D"'
0 D mpenning  8157  8032  0  80   0 - 26293 rpc_wa 09:59 pts/2    00:00:00 touch /mnt/share02/this
[mpenning@host192_0_2_2 ~]$
Run Code Online (Sandbox Code Playgroud)

showmount 不过没发现问题....

[mpenning@host192_0_2_2 ~]$ showmount -e 192.0.2.3
Export list for 192.0.2.3:
/mnt/nfs01/volnfs01/share01 192.0.2.2/255.255.255.255,192.0.2.1/255.255.255.255
/mnt/nfs01/volnfs01/share02 192.0.2.2/255.255.255.255,192.0.2.1/255.255.255.255
[mpenning@host192_0_2_2 ~]$
Run Code Online (Sandbox Code Playgroud)

NFS服务的各种状态...

[mpenning@host192_0_2_2 ~]$ service nfs status
rpc.svcgssd is stopped
rpc.mountd (pid 9168) is running...
nfsd (pid 9232 9231 9230 9229 9228 9227 9226 9225) is running...
rpc.rquotad (pid 9164) is running...
[mpenning@host192_0_2_2 ~]$ service rpcbind status
rpcbind (pid  9088) is running...
[mpenning@host192_0_2_2 ~]$ service nfslock status
rpc.statd (pid  9256) is running...
[mpenning@host192_0_2_2 ~]$
Run Code Online (Sandbox Code Playgroud)

网络配置(默认 gw 不是必需的,因为这是一个专用的 layer2 NFS vlan):

[mpenning@host192_0_2_2 ~]$ sudo cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
NM_CONTROLLED=no
ONBOOT=yes
BOOTPROTO=none
IPADDR=192.0.2.2
NETMASK=255.255.255.0
DNS2=none
TYPE=Ethernet
GATEWAY=
DNS1=none
IPV6INIT=no
USERCTL=no
MTU=9000
[mpenning@host192_0_2_2 ~]$
Run Code Online (Sandbox Code Playgroud)

这看起来很恶心。我在 192.0.2.2 上做了以下事情:

  • 重新启动所有 NFS
  • init 6 机器
  • ping 192.0.2.3 确保它仍然可以连接到服务器
  • 检查 dmesg
  • 已检查 showmount -e 192.0.2.3

这感觉像是一个权限问题,但我不知道从哪里开始......

如何解决此问题,以便我可以读取/写入 192.0.2.2 挂载的任何文件192.0.2.3:/mnt/nfs01/volnfs01/share02


touch 作品

如果我touch从 192.0.2.1执行相同的命令,一切都很好......

rt_sigaction(SIGRTMIN, {0xb096e0, [], SA_SIGINFO}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0xb09b80, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
uname({sys="Linux", node="host192_0_2_1.localdomain.local", ...}) = 0
brk(0)                                  = 0x8d4d000
brk(0x8d6e000)                          = 0x8d6e000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=99158544, ...}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7574000
close(3)                                = 0
open("/mnt/share02/this", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK|O_LARGEFILE, 0666) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
utimensat(0, NULL, NULL, 0)             = 0
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
Run Code Online (Sandbox Code Playgroud)

/etc/exports 从 192.0.2.3

[root@T1-Netfile01 backups]# head /etc/exports

# PLEASE DO NOT MODIFY THIS CONFIGURATION FILE!
#       This configuration file was autogenerated
#       by Openfiler. Any manual changes will be overwritten
#       Generated at: Fri Nov 8 9:35:39 CST 2013

/mnt/nfs01/volnfs01/share02 192.0.2.1/255.255.255.255(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)  192.0.2.2/255.255.255.255(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)

/mnt/nfs01/volnfs01/share01 192.0.2.1/255.255.255.255(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)  192.0.2.2/255.255.255.255(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)

[root@T1-Netfile01 backups]#
Run Code Online (Sandbox Code Playgroud)

slm*_*slm 3

如果更改文件中 IP 的顺序,/etc/exports会发生什么?将 .2.2 IP 设置为第一,将 .2.1 设置为第二。

另外,我会使用以下命令确认导出所呈现的内容:

$ showmount -e 192.0.2.3
Run Code Online (Sandbox Code Playgroud)

/etc/exports格式可以说很讲究!

其他值得尝试的事情

  1. 我通常这样指定我的主机/etc/exports

    /cobbler/isos   192.168.1.0/24(rw,no_root_squash)
    
    Run Code Online (Sandbox Code Playgroud)

    因此对于只有一个主机 IP 的您:

    /mnt/nfs01/volnfs01/share02 192.0.2.1/32(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)  192.0.2.2/32(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)
    /mnt/nfs01/volnfs01/share01 192.0.2.1/32(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)  192.0.2.2/32(rw,anonuid=96,anongid=96,secure,root_squash,wdelay,sync)
    
    Run Code Online (Sandbox Code Playgroud)
  2. nfs相关服务

    确保该服务nfslock和其他相关服务都在 192.0.2.2 上运行。

  3. 如果您使用巨型帧,请确保它ping -s <jumbo_mtu> 192.0.2.3适用于 192.0.2.2