pno*_*nak 9 kubernetes google-kubernetes-engine google-stackdriver
我想为 OOMKilled 事件设置检测,在检查 pod 时看起来像这样:
Name: pnovotnak-manhole-123456789-82l2h
Namespace: test
Node: test-cluster-cja8smaK-oQSR/10.x.x.x
Start Time: Fri, 03 Feb 2017 14:34:57 -0800
Labels: pod-template-hash=123456789
run=pnovotnak-manhole
Status: Running
IP: 10.x.x.x
Controllers: ReplicaSet/pnovotnak-manhole-123456789
Containers:
pnovotnak-manhole:
Container ID: docker://...
Image: pnovotnak/it
Image ID: docker://sha256:...
Port:
Limits:
cpu: 2
memory: 3Gi
Requests:
cpu: 200m
memory: 256Mi
State: Running
Started: Fri, 03 Feb 2017 14:41:12 -0800
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 03 Feb 2017 14:35:08 -0800
Finished: Fri, 03 Feb 2017 14:41:11 -0800
Ready: True
Restart Count: 1
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tder (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-46euo:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tder
QoS Class: Burstable
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
11m 11m 1 {default-scheduler } Normal Scheduled Successfully assigned pnovotnak-manhole-123456789-82l2h to test-cluster-cja8smaK-oQSR
10m 10m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Created Created container with docker id xxxxxxxxxxxx; Security:[seccomp=unconfined]
10m 10m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Started Started container with docker id xxxxxxxxxxxx
11m 4m 2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Pulling pulling image "pnovotnak/it"
10m 4m 2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Pulled Successfully pulled image "pnovotnak/it"
4m 4m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Created Created container with docker id yyyyyyyyyyyy; Security:[seccomp=unconfined]
4m 4m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Started Started container with docker id yyyyyyyyyyyy
Run Code Online (Sandbox Code Playgroud)
我从 pod 日志中得到的只是;
{
textPayload: "shutting down, got signal: Terminated
"
insertId: "aaaaaaaaaaaaaaaa"
resource: {
type: "container"
labels: {
pod_id: "pnovotnak-manhole-123456789-82l2h"
...
}
}
timestamp: "2017-02-03T22:34:48Z"
severity: "ERROR"
labels: {
container.googleapis.com/container_name: "POD"
...
}
logName: "projects/myproj/logs/POD"
}
Run Code Online (Sandbox Code Playgroud)
还有 kublet 日志;
{
insertId: "bbbbbbbbbbbbbb"
jsonPayload: {
_BOOT_ID: "ffffffffffffffffffffffffffffffff"
MESSAGE: "I0203 22:41:11.925928 1843 kubelet.go:1816] SyncLoop (PLEG): "pnovotnak-manhole-123456789-82l2h_test(a-uuid)", event: &pleg.PodLifecycleEvent{ID:"another-uuid", Type:"ContainerDied", Data:"..."}"
...
Run Code Online (Sandbox Code Playgroud)
这似乎不足以将其唯一标识为 OOM 事件。还有其他想法吗?
小智 6
尽管日志中不存在 OOMKilled 事件,但如果您可以检测到 pod 被杀死,则可以使用它kubectl get pod -o go-template=... <pod-id>来确定原因。作为直接来自文档的示例:
[13:59:01] $ ./cluster/kubectl.sh get pod -o go-template='{{range.status.containerStatuses}}{{"Container Name: "}}{{.name}}{{"\r\nLastState: "}}{{.lastState}}{{end}}' simmemleak-60xbc
Container Name: simmemleak
LastState: map[terminated:map[exitCode:137 reason:OOM Killed startedAt:2015-07-07T20:58:43Z finishedAt:2015-07-07T20:58:43Z containerID:docker://0e4095bba1feccdfe7ef9fb6ebffe972b4b14285d5acdec6f0d3ae8a22fad8b2]]
Run Code Online (Sandbox Code Playgroud)
如果您以编程方式执行此操作,则依赖kubectl输出的更好替代方法是使用 Kubernetes REST APIGET /api/v1/pods方法。文档中还提供了访问 API 的方法。
| 归档时间: |
|
| 查看次数: |
6263 次 |
| 最近记录: |