目前,我可以用来srun [variety of settings] bash在计算笔记上创建 shell。但是,如果我的 ssh 由于某种原因断开连接并且我想重新访问 shell,我该怎么做?
假设从您的笔记本电脑到集群登录节点的 SSH 连接不稳定,您可以使用终端多路复用器,例如screen或tmux,具体取决于登录节点上已安装的内容。
通常,会话看起来像这样
[you@yourlaptop ~]$ ssh cluster-frontend
[you@cluster ~]$ tmux # to enter a persistent tmux session
[you@cluster ~]$ srun [...] bash # to get a shell on a compute node
[you@computenode ~]$ # some work, then...
some SSH error (e.g. Write failed: Broken pipe)
[you@yourlaptop ~]$ ssh cluster-frontend
[you@cluster ~]$ tmux a # to re-attach to the persistent tmux session
[you@computenode ~]$ # resume work
Run Code Online (Sandbox Code Playgroud)
对于screen,您会使用screen -r而不是tmux a。否则过程是相同的。
如果您想从另一个终端实例(右下)加入作业,您可以使用 Slurm 的sattach命令。
[you@yourlaptop ~]$ ssh cluster-frontend |
[you@cluster ~]$ srun [...] bash |
srun: job ******* queued and waiting for resources |
srun: job ******* has been allocated resources | [you@yourlaptop ~]$ ssh cluster-frontend
[you@computenode ~]$ | [you@cluster ~]$ sattach --pty ********
[you@computenode ~]$ echo OK | [you@computenode ~]$ echo OK
[you@computenode ~]$ OK | [you@computenode ~]$ OK
Run Code Online (Sandbox Code Playgroud)
原来的终端和运行的终端sattach现在完全同步。
请注意,上述内容并不能防止意外终止srun;每当srun终止时,作业也会终止。