我实际上是通过Mesos使用Docker容器的堆栈运行任务。
有时,某些任务失败了。
以下是一些相关的TaskStatus消息和原因:
message: Container exited with status 1 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 42 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 137 - reason: REASON_COMMAND_EXECUTOR_FAILED
Run Code Online (Sandbox Code Playgroud)
是否有对应关系表将TaskStatus消息中的容器错误状态代码与更显式的错误链接在一起?
命令任务可能由于多种原因而失败,并设置了正确的退出代码。例如Docker 1.10设置了以下退出状态代码(来自文档和此答案):
docker run的退出代码提供有关为何容器无法运行或为何退出的信息。当docker run使用非零代码退出时,退出代码遵循chroot标准,请参见下文:
125,如果错误是由Docker守护程序本身引起的:
Run Code Online (Sandbox Code Playgroud)$ docker run --foo busybox; echo $? # flag provided but not defined: --foo See 'docker run --help'.126如果无法调用所包含的命令:
Run Code Online (Sandbox Code Playgroud)$ docker run busybox /etc; echo $? # docker: Error response from daemon: Container command '/etc' could not be invoked.127如果找不到所包含的命令
Run Code Online (Sandbox Code Playgroud)$ docker run busybox foo; echo $? # docker: Error response from daemon: Container command 'foo' not found or does not exist. 127 Exit code of contained command除此以外
Run Code Online (Sandbox Code Playgroud)$ docker run busybox /bin/sh -c 'exit 3'; echo $? # 3
在这里可以找到另一个退出代码规则
| Code | Meaning | Example | Comments |
|-------|--------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------|
| 1 | Catchall for general errors | let "var1 = 1/0" | Miscellaneous errors, such as "divide by zero" and other impermissible operations |
| 2 | Misuse of shell builtins | empty_function() {} | Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison). |
| 126 | Command invoked cannot execute | /dev/null | Permission problem or command is not an executable |
| 127 | "command not found" | illegal_command | Possible problem with $PATH or a typo |
| 128 | Invalid argument to exit | exit 3.14159 | exit takes only integer args in the range 0 - 255 (see first footnote) |
| 128+n | Fatal error signal "n" | kill -9 $PPID of script | $? returns 137 (128 + 9) |
| 130 | Script terminated by Control-C | Ctl-C | Control-C is fatal error signal 2, (130 = 128 + 2, see above) |
| 255* | Exit status out of range | exit -1 | exit takes only integer args in the range 0 - 255 |
Run Code Online (Sandbox Code Playgroud)
根据您的示例:
128 + 9 = 137 (9 coming from SIGKILL)并可能被转码为“内存不足”错误并被杀死。1。可能是由于配置无效,内部应用程序错误或输入无效引起的。如果您需要更多信息来解释状态码,可以在Mesos TaskStatus更新中检查“ 消息”字段,例如Mesos在其中放置了有关OOM的信息。在Mesos日志中也可以找到相同的信息。要调试命令返回非零代码的原因,您可以检查存储在执行程序沙箱中的文件,尤其是stderr / stdout或命令特定的日志。
| 归档时间: |
|
| 查看次数: |
7140 次 |
| 最近记录: |