在ubuntu 16.04下运行hadoop时注销

Mic*_*ael 6 hadoop ubuntu-16.04

我在ubuntu 16.04下在伪群集和群集模式下运行hadoop作业时遇到了一些麻烦.

在运行vanila hadoop/hdfs安装时 - 我的hadoop用户被注销,并且该用户运行的所有进程都被关闭.我没有在日志中看到任何指示(/ var/log/systemd,journalctl或dmesg),这解释了用户注销的原因.

似乎我不是唯一有这个或类似问题的人:

/sf/ask/2680171371/

注意:创建特殊的hadoop用户实际上并没有解决我的问题 - 但限制了注销到专用用户.

https://askubuntu.com/questions/784591/ubuntu-16-04-kills-session-when-resource-usage-is-extremely-high

是否有可能围绕UserGroupInformation类的某些问题(在某些情况下会导致注销),在ubuntu 16.04中systemd中的某些更改可能会导致此行为?

我在注销之前得到的hadoop日志的最后几行:

...
16/07/13 16:45:37 DEBUG ipc.ProtobufRpcEngine: Call: getJobReport took 4ms
16/07/13 16:45:37 DEBUG security.UserGroupInformation: PrivilegedAction
as:hduser (auth:SIMPLE)
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
16/07/13 16:45:37 DEBUG ipc.Client: IPC Client (1360814716) connection to
laptop/127.0.1.1:37339 from hduser sending #375
16/07/13 16:45:37 DEBUG ipc.Client: IPC Client (1360814716) connection to
laptop/127.0.1.1:37339 from hduser got value #375
16/07/13 16:45:37 DEBUG ipc.ProtobufRpcEngine: Call: getJobReport took 2ms
Terminated
hduser@laptop:~$ 16/07/13 16:45:37 DEBUG ipc.Client: stopping client from
cache: org.apache.hadoop.ipc.Client@4e7ab839
exit
Run Code Online (Sandbox Code Playgroud)

journalctl:

Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 7.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 6.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 5.
Jul 12 16:06:44 laptop systemd-logind[978]: Removed session 8.
Run Code Online (Sandbox Code Playgroud)

系统日志:

Jul 12 16:06:43 laptop systemd[4172]: Stopped target Default.
Jul 12 16:06:43 laptop systemd[4172]: Reached target Shutdown.
Jul 12 16:06:44 laptop systemd[4172]: Starting Exit the Session...
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Basic System.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Sockets.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Paths.
Jul 12 16:06:44 laptop systemd[4172]: Stopped target Timers.
Jul 12 16:06:44 laptop systemd[4172]: Received SIGRTMIN+24 from PID
10101 (kill).
Jul 12 16:06:44 laptop systemd[1]: Stopped User Manager for UID 1001.
Jul 12 16:06:44 laptop systemd[1]: Removed slice User Slice of hduser.
Run Code Online (Sandbox Code Playgroud)

小智 6

我也有问题.花了我一些时间,但我在这里找到了解决方案:https://unix.stackexchange.com/questions/293069/all-services-of-a-user-are-killed-when-running-multiple-services-under-这个用户

基本上,一些hadoop进程就停止了,因为为什么不呢.但是当看到服务进程死亡时,systemd似乎会杀死所有用户的进程.

修复是添加

[login]
KillUserProcesses=no
Run Code Online (Sandbox Code Playgroud)

/etc/systemd/logind.conf并重新启动.

我有多个ubuntu的版本来调试问题,修复似乎只适用于ubuntu 16.04.