Kubernetes Pod终止事件的历史?

ger*_*lus 8 kubernetes

有没有办法查看吊舱终止状态的历史记录?例如。如果查看pod describe命令,则会看到类似以下的输出:

State:      Running
  Started:      Mon, 10 Jul 2017 13:09:20 +0300
Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137
  Started:      Thu, 06 Jul 2017 11:01:21 +0300
  Finished:     Mon, 10 Jul 2017 13:09:18 +0300
Run Code Online (Sandbox Code Playgroud)

pod describePod事件中没有显示相同的内容:

   Events:
  FirstSeen LastSeen    Count   From                    SubObjectPath       Type        Reason  Message
  --------- --------    -----   ----                    -------------       --------    ------  -------
  10m       10m     1   kubelet, gke-dev-default-d8f2dbc5-mbkb  spec.containers{demo}   Normal      Pulled  Container image "eu.gcr.io/project/image:v1" already present on machine
  10m       10m     1   kubelet, gke-dev-default-d8f2dbc5-mbkb  spec.containers{demo}   Normal      Created Created container with id 1d857caae77bdc43f0bc90fe045ed5050f85436479073b0e6b46454500f4eb5a
  10m       10m     1   kubelet, gke-dev-default-d8f2dbc5-mbkb  spec.containers{demo}   Normal      Started Started container with id 1d857caae77bdc43f0bc90fe045ed5050f85436479073b0e6b46454500f4eb5a
Run Code Online (Sandbox Code Playgroud)

如果我kubectl get events --all-namespaces查看该事件,但是无法将其与特定的pod相关联:

  default   12m       12m       1         gke-dev-default-d8f2dbc5-mbkb   Node                Warning   OOMKilling   kernel-monitor, gke-dev-default-d8f2dbc5-mbkb   Memory cgroup out of memory: Kill process 1639 (java) score 2014 or sacrifice child
Killed process 1639 (java) total-vm:10828960kB, anon-rss:1013756kB, file-rss:22308kB
Run Code Online (Sandbox Code Playgroud)

甚至通过api提交的事件Details都具有误导性的信息(例如,default尽管pod实际上在demo名称空间中,但也包含名称空间):

    "metadata": {
        "name": "gke-dev-default-d8f2dbc5-mbkb.14cff03fe771b053",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/events/gke-dev-default-d8f2dbc5-mbkb.14cff03fe771b053",
        "uid": "d5d3230e-6557-11e7-a486-42010a8401d3",
        "resourceVersion": "5278875",
        "creationTimestamp": "2017-07-10T10:09:18Z"
    },
    "involvedObject": {
        "kind": "Node",
        "name": "gke-dev-default-d8f2dbc5-mbkb",
        "uid": "gke-dev-default-d8f2dbc5-mbkb"
    },
    "reason": "OOMKilling",
    "message": "Memory cgroup out of memory: Kill process 1639 (java) score 2014 or sacrifice child\nKilled process 1639 (java) total-vm:10828960kB, anon-rss:1013756kB, file-rss:22308kB",
    "source": {
        "component": "kernel-monitor",
        "host": "gke-dev-default-d8f2dbc5-mbkb"
    },
    "firstTimestamp": "2017-07-10T10:09:18Z",
    "lastTimestamp": "2017-07-10T10:09:18Z",
    "count": 1,
    "type": "Warning"
Run Code Online (Sandbox Code Playgroud)

因此,尽管我可以通过看到最后的终止状态pod describe,以前的状态如何?

Ahm*_*gle 6

驱逐事件是节点事件。这就是为什么您在 Pod 事件中看不到它们的原因。如果您使用运行kubectl describe node <node_name>pod 的节点运行,您可以看到驱逐事件。

测试一下:运行一个会不断出现 OOMKilled 的部署:

kubectl run memory-hog --image=gisleburt/my-memory-hog --replicas=2 --limits=memory=128m
Run Code Online (Sandbox Code Playgroud)

一旦 Pod 开始运行和死亡,您可以运行kubectl get events或使用kubectl describe node <node_name>,然后您将看到如下事件:

Events:
  FirstSeen LastSeen    Count   From                            SubObjectPath   Type        Reason      Message
  --------- --------    -----   ----                            -------------   --------    ------      -------
  2m        2m      1   kernel-monitor, gke-test-default-pool-649c88dd-818j         Warning     OOMKilling  Memory cgroup out of memory: Kill process 7345 (exe) score 50000 or sacrifice child
Killed process 7345 (exe) total-vm:6092kB, anon-rss:64kB, file-rss:112kB
  2m    2m  1   kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  Memory cgroup out of memory: Kill process 7409 (exe) score 51000 or sacrifice child
Killed process 7409 (exe) total-vm:6092kB, anon-rss:68kB, file-rss:112kB
  2m    2m  1   kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  Memory cgroup out of memory: Kill process 7495 (exe) score 50000 or sacrifice child
Killed process 7495 (exe) total-vm:6092kB, anon-rss:64kB, file-rss:112kB
  2m    2m  1   kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  Memory cgroup out of memory: Kill process 7561 (exe) score 49000 or sacrifice child
Killed process 7561 (exe) total-vm:6092kB, anon-rss:60kB, file-rss:112kB
  2m    2m  1   kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  Memory cgroup out of memory: Kill process 7638 (exe) score 494000 or sacrifice child
Killed process 7638 (exe) total-vm:7536kB, anon-rss:148kB, file-rss:1832kB
  2m    2m  1   kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  Memory cgroup out of memory: Kill process 7728 (exe) score 49000 or sacrifice child
Killed process 7728 (exe) total-vm:6092kB, anon-rss:60kB, file-rss:112kB
  2m    2m  1   kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  Memory cgroup out of memory: Kill process 7876 (exe) score 48000 or sacrifice child
Killed process 7876 (exe) total-vm:6092kB, anon-rss:60kB, file-rss:112kB
  2m    2m  1   kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  Memory cgroup out of memory: Kill process 8013 (exe) score 480000 or sacrifice child
Killed process 8013 (exe) total-vm:15732kB, anon-rss:152kB, file-rss:1768kB
  2m    2m  1   kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  Memory cgroup out of memory: Kill process 8140 (exe) score 1023000 or sacrifice child
Killed process 8140 (exe) total-vm:24184kB, anon-rss:448kB, file-rss:3704kB
  2m    25s 50  kernel-monitor, gke-test-default-pool-649c88dd-818j     Warning OOMKilling  (events with common reason combined)
Run Code Online (Sandbox Code Playgroud)

  • 是的,正如我所说,我可以通过 kubectl get events --all-namespaces 查看这些事件,但我无法将此事件与 Pod 相关联,尽管 Pod 本身具有终止原因为 OOMKilled (3认同)