给定容器错误状态代码,在哪里可以找到更多显式错误?

Axe*_*rja 2 docker mesos

我实际上是通过Mesos使用Docker容器的堆栈运行任务。

有时,某些任务失败了。

以下是一些相关的TaskStatus消息和原因:

message: Container exited with status 1 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 42 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 137 - reason: REASON_COMMAND_EXECUTOR_FAILED
Run Code Online (Sandbox Code Playgroud)

是否有对应关系表将TaskStatus消息中的容器错误状态代码与更显式的错误链接在一起?

jan*_*isz 5

命令任务可能由于多种原因而失败,并设置了正确的退出代码。例如Docker 1.10设置了以下退出状态代码(来自文档此答案):

docker run的退出代码提供有关为何容器无法运行或为何退出的信息。当docker run使用非零代码退出时,退出代码遵循chroot标准,请参见下文:

125,如果错误是由Docker守护程序本身引起的

$ docker run --foo busybox; echo $?
# flag provided but not defined: --foo   See 'docker run --help'.   
Run Code Online (Sandbox Code Playgroud)

126如果无法调用所包含的命令:

$ docker run busybox /etc; echo $?
# docker: Error response from daemon: Container command '/etc' could not be invoked.   
Run Code Online (Sandbox Code Playgroud)

127如果找不到所包含的命令

$ docker run busybox foo; echo $?
# docker: Error response from daemon: Container command 'foo' not found or does not exist.   127 Exit code of contained command
Run Code Online (Sandbox Code Playgroud)

除此以外

$ docker run busybox /bin/sh -c 'exit 3'; echo $?
# 3
Run Code Online (Sandbox Code Playgroud)

这里可以找到另一个退出代码规则

| Code  |            Meaning             |         Example         |                                                   Comments                                                   |
|-------|--------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------|
| 1     | Catchall for general errors    | let "var1 = 1/0"        | Miscellaneous errors, such as "divide by zero" and other impermissible operations                            |
| 2     | Misuse of shell builtins       | empty_function() {}     | Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison). |
| 126   | Command invoked cannot execute | /dev/null               | Permission problem or command is not an executable                                                           |
| 127   | "command not found"            | illegal_command         | Possible problem with $PATH or a typo                                                                        |
| 128   | Invalid argument to exit       | exit 3.14159            | exit takes only integer args in the range 0 - 255 (see first footnote)                                       |
| 128+n | Fatal error signal "n"         | kill -9 $PPID of script | $? returns 137 (128 + 9)                                                                                     |
| 130   | Script terminated by Control-C | Ctl-C                   | Control-C is fatal error signal 2, (130 = 128 + 2, see above)                                                |
| 255*  | Exit status out of range       | exit -1                 | exit takes only integer args in the range 0 - 255                                                            |
Run Code Online (Sandbox Code Playgroud)

根据您的示例:

如果您需要更多信息来解释状态码,可以在Mesos TaskStatus更新中检查“ 消息”字段,例如Mesos在其中放置了有关OOM的信息。在Mesos日志中也可以找到相同的信息。要调试命令返回非零代码的原因,您可以检查存储在执行程序沙箱中的文件,尤其是stderr / stdout或命令特定的日志。