TaintManagerEviction - 可能是我的 pod 每天获得几次新 IP 的原因

gra*_*bag 5 kubernetes microk8s

microk8s kubectl 描述 pod mysql-deployment-756f9d8cdf-8kzdw

注意 11 分钟的年龄。

Events:
  Type    Reason          Age   From     Message
  ----    ------          ----  ----     -------
  Normal  SandboxChanged  11m   kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal  Pulled          11m   kubelet  Container image "mysql:5.7" already present on machine
  Normal  Created         11m   kubelet  Created container mysql-container
  Normal  Started         11m   kubelet  Started container mysql-container
Run Code Online (Sandbox Code Playgroud)

microk8s get pods -o Wide 请注意 41h 以及 IP 地址在大约 11 分钟前发生了变化。

NAME                                        READY   STATUS    RESTARTS   AGE   IP             NODE                   NOMINATED NODE   READINESS GATES

mysql-deployment-756f9d8cdf-8kzdw           1/1     Running   3          41h   10.1.167.149   john-trx40-designare   <none>           <none>
Run Code Online (Sandbox Code Playgroud)

microk8s kubectl 日志 mysql-deployment-756f9d8cdf-8kzdw

报告一些

2020-12-08T02:10:10.264100Z 32 [Note] Aborted connection 32 to db: 'jjg_script_db' user: 'root' host: '10.1.167.159' (Got an error reading communication packets)
Run Code Online (Sandbox Code Playgroud)

其他 pod 报告 dhcp 查找失败,然后崩溃并重新创建..

感觉 IP 租约快用完了,但我宁愿看日志而不是猜测。这么说是因为 sql pod 的年龄不断增加。完全相同的 sql 映像和数据在 docker-compose 上可以运行数月。频率与流量无关。

sudo microk8s inform 产生大量日志文件,查看每个人,但可能由于大量日志而错过了关键事件,并且不知道在哪里查看。

在哪里查找导致 SandboxChanged 的​​原因/触发器的日志?如果我的猜测是正确的,那么这是 IP 租用的问题,我在哪里可以找到 microk8s kubernetes 的 IP 分配日志?

我的设置是 ubuntu 18.04 主机,除了 docker、docker-compose、git、Visual Studio Code 和 microk8s kubernetes 之外,几乎没有安装任何东西。

一切都恢复了,但中断很烦人,不知道该往哪里看让我发疯。


PjoterS 要求的额外信息

入口

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-myservice-jjg
  annotations:
    # use the shared ingress-nginx
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/affinity-mode: "balanced" # "persistent"
    # added next 2 lines for secured https after I got a certificate
    certmanager.k8s.io/cluster-issuer: "letsencrypt-issuer"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  # added tls lines for secured https after I got a certificate
  tls:
    - hosts:
        - ancient-script.org
        - www.ancient-script.org
      secretName: ancient-script-org-crt-secret
  rules:
  - host: ancient-script.org
    http:
      paths:
      - path: /
        backend:
          serviceName: express-service
          servicePort: 3000
  - host: www.ancient-script.org
    http:
      paths:
      - path: /
        backend:
          serviceName: express-service
          servicePort: 3000
Run Code Online (Sandbox Code Playgroud)

部署

apiVersion: v1
kind: Service
metadata:
  name: mysql-service
spec:
  ports:
  - protocol: TCP     # default is TCP
    port: 3306        # incoming port from within kubernetes
    targetPort: 3306  # default, port on the pod
  selector:
    app: mysql-pod
  clusterIP: None
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: mysql-deployment
spec:
  selector:
    matchLabels:
      app: mysql-pod
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mysql-pod
    spec:
      containers:
      - image: mysql:5.7
        name: mysql-container
        args: 
        - --sql-mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: 'todochange'
        - name: MYSQL_DATABASE
          value: ancient_script_db
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-persistent-storage
        persistentVolumeClaim:
          claimName: mysql-pv-claim
Run Code Online (Sandbox Code Playgroud)

IP 更改的同时来自 MySQL 的日志

2020-12-10T19:39:43.795268Z 44 [Note] Aborted connection 44 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
2020-12-10T19:39:43.796587Z 43 [Note] Aborted connection 43 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
2020-12-10T19:39:43.796761Z 41 [Note] Aborted connection 41 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
2020-12-10T19:39:43.796831Z 38 [Note] Aborted connection 38 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
2020-12-10T19:39:43.796889Z 42 [Note] Aborted connection 42 to db: 'ancient_script_db' user: 'root' host: '10.1.167.141' (Got an error reading communication packets)
Run Code Online (Sandbox Code Playgroud)

从另一起事件来看,它们每天发生两次:

请注意,当尝试从使用 MySQL 的另一个 pod 获取日志时,microk8s 会发出错误消息,当 microk9s 最终允许日志时,日志显示 dhcp issus。

john@john-trx40-designare:~/Documents/GitHub/help-me-transcribe$ k describe pod express-deployment-64947b66b9-84vzc
Name:         express-deployment-64947b66b9-84vzc
Namespace:    default
Priority:     0
Node:         john-trx40-designare/99.153.71.9
Start Time:   Fri, 11 Dec 2020 12:27:13 -0600
Labels:       app=express-pod
              pod-template-hash=64947b66b9
Annotations:  cni.projectcalico.org/podIP: 10.1.167.153/32
              cni.projectcalico.org/podIPs: 10.1.167.153/32
Status:       Running
IP:           10.1.167.153
IPs:
  IP:           10.1.167.153
Controlled By:  ReplicaSet/express-deployment-64947b66b9
Containers:
  express:
    Container ID:   containerd://e9d178313638d3a8985caef57ae6e45f2b37b5ae08032e0eeba01c30a12676ce
    Image:          localhost:32000/express-server:20201211d
    Image ID:       localhost:32000/express-server@sha256:95fcc8679727820cff428f657ea7c32811681c2782e3692fbad0041ffcd3d935
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 11 Dec 2020 12:27:15 -0600
    Ready:          True
    Restart Count:  0
    Environment:
      mySQL_connection_limit:                  200
      mySQL_host:                              mysql-service
      mySQL_port:                              3306
      mySQL_user:                              root
      ROOT_PATH_user_files:                    /user_files
      ROOT_PATH_crop_images:                   /crop_images
      ROOT_PATH_bulk_input_AI_transcriptions:  /temp/FEB_2020_AI_transcription.txt #is this used?
      ROOT_PATH_CURRICULUM:                    /temp/curriculum/  #is this used?
      ROOT_PATH_transcribed_words:             /transcription_db/crops_sets/transcribed_words/ #is this used?
      ROOT_PATH_train_test:                    /transcription_db/train_test #is this used?
      IMAGE_SERVICE_ADDRESS:                   image-service
      TRANSCRIBE_SERVICE_ADDRESS:              transcription-service
      LOGIN_COOKIE_NAME:                       ancient_script_signed_login_token
    Mounts:
      /crop_images from name-ci (rw)
      /user_files from name-uf (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qnqqd (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  name-uf:
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/disk2/Documents/help_me_transcribe/production/pv/user_files
    HostPathType:  
  name-ci:
    Type:          HostPath (bare host directory volume)
    Path:          /mnt/disk2/Documents/help_me_transcribe/production/pv/crop_images
    HostPathType:  
  default-token-qnqqd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qnqqd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  28m   default-scheduler  Successfully assigned default/express-deployment-64947b66b9-84vzc to john-trx40-designare
  Normal  Pulling    28m   kubelet            Pulling image "localhost:32000/express-server:20201211d"
  Normal  Pulled     28m   kubelet            Successfully pulled image "localhost:32000/express-server:20201211d" in 446.977029ms
  Normal  Created    28m   kubelet            Created container express
  Normal  Started    28m   kubelet            Started container express
john@john-trx40-designare:~/Documents/GitHub/help-me-transcribe$ k logs express-deployment-64947b66b9-84vzc
Error from server (NotFound): the server could not find the requested resource ( pods/log express-deployment-64947b66b9-84vzc)

john@john-trx40-designare:~/Documents/GitHub/help-me-transcribe$ k logs express-deployment-64947b66b9-84vzc
process.env.mySQL_host = mysql-service
process.env.mySQL_port = 3306
process.env.mySQL_user = root
initializing socketApi.js
FOCUS
TODO, need to process all user files not just john_grabner
Scanning all files to make sure present in database ... this will take some time
Listening on port 3000 when launched native or from docker networks, maybe remapped to host in docker-compose
Unhandled Rejection at: Promise Promise {
  <rejected> { Error: getaddrinfo ENOTFOUND mysql-service mysql-service:3306
    at errnoException (dns.js:55:10)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:97:26)
    --------------------
    at Protocol._enqueue (/app/src/help-me-transcribe-server/node_modules/mysql/lib/protocol/Protocol.js:145:48)
    at Protocol.handshake (/app/src/help-me-transcribe-server/node_modules/mysql/lib/protocol/Protocol.js:52:23)
    at PoolConnection.connect (/app/src/help-me-transcribe-server/node_modules/mysql/lib/Connection.js:130:18)
    at Pool.getConnection (/app/src/help-me-transcribe-server/node_modules/mysql/lib/Pool.js:48:16)
    at Promise (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:47:22)
    at new Promise (<anonymous>)
    at getConnection (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:42:16)
    at query (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:77:12)
    at query (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:74:16)
    at Object.query_one_or_null (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:101:12)
  code: 'ENOTFOUND',
  errno: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'mysql-service',
  host: 'mysql-service',
  port: 3306,
  fatal: true } } reason: { Error: getaddrinfo ENOTFOUND mysql-service mysql-service:3306
    at errnoException (dns.js:55:10)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:97:26)
    --------------------
    at Protocol._enqueue (/app/src/help-me-transcribe-server/node_modules/mysql/lib/protocol/Protocol.js:145:48)
    at Protocol.handshake (/app/src/help-me-transcribe-server/node_modules/mysql/lib/protocol/Protocol.js:52:23)
    at PoolConnection.connect (/app/src/help-me-transcribe-server/node_modules/mysql/lib/Connection.js:130:18)
    at Pool.getConnection (/app/src/help-me-transcribe-server/node_modules/mysql/lib/Pool.js:48:16)
    at Promise (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:47:22)
    at new Promise (<anonymous>)
    at getConnection (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:42:16)
    at query (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:77:12)
    at query (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:74:16)
    at Object.query_one_or_null (/app/src/help-me-transcribe-server/help-me-data-access/sql.js:101:12)
  code: 'ENOTFOUND',
  errno: 'ENOTFOUND',
  syscall: 'getaddrinfo',
  hostname: 'mysql-service',
  host: 'mysql-service',
  port: 3306,
  fatal: true }
Run Code Online (Sandbox Code Playgroud)
as per request, more info on mysql config

# microk8s kubectl get pv --sort-by=.spec.capacity.storage --namespace=production
apiVersion: v1
kind: PersistentVolume
metadata:
  # namespace: production
  name: mysql-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/Disk2/Documents/help_me_transcribe/production/pv/mysql-pv-volume"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  # namespace: production
  name: mysql-pv-claim
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi

# microk8s kubectl get services --namespace=production                          # List all services in the namespace
apiVersion: v1
kind: Service
metadata:
  # namespace: production
  name: mysql-service
spec:
  #type: xxxxxxx      # default, ClusterIP: Exposes the Service on a cluster-internal IP
                      # NodePort: Exposes the Service on each Node's IP at a static port (the NodePort, 30000-32767)
  ports:
  - protocol: TCP     # default is TCP
    port: 3306        # incoming port from within kubernetes
    targetPort: 3306  # default, port on the pod
    #nodePort: 33306
  selector:
    app: mysql-pod
  clusterIP: None
---
# microk8s kubectl get pods --namespace=production
# microk8s kubectl get pods -o wide --namespace=production                     # List all pods in the current namespace, with more details
                                                         # notice the IP address 10.1.167.xxx ... use this for "MySQL Workbench"
# microk8s kubectl describe pods my-pod --namespace=production
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  # namespace: production
  name: mysql-deployment
spec:
  selector:
    matchLabels:
      app: mysql-pod
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mysql-pod
    spec:
      containers:
      - image: mysql:5.7
        name: mysql-container
        args: 
        - --sql-mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
        env:
        - name: MYSQL_DATABASE
          value: ancient_script_db
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-persistent-storage
        # hostPath:
        #   path: "/Disk2/Documents/help_me_transcribe/production/pv/mysql-pv-volume"
        persistentVolumeClaim:
          claimName: mysql-pv-claim
Run Code Online (Sandbox Code Playgroud)

2021 年 1 月 5 日更新

microk8s 中的某些内容一定正在消耗“microk8s kubectl get events”,因为这通常会报告“在默认命名空间中找不到资源”。幸运的是,刚才捕获了一个看起来像 taintManager 正在做某事的事件。

microk8s kubectl get events
LAST SEEN   TYPE      REASON                    OBJECT                                          MESSAGE
49m         Normal    Starting                  node/john-trx40-designare                       Starting kube-proxy.
49m         Normal    Starting                  node/john-trx40-designare                       Starting kubelet.
49m         Warning   InvalidDiskCapacity       node/john-trx40-designare                       invalid capacity 0 on image filesystem
49m         Normal    NodeHasSufficientMemory   node/john-trx40-designare                       Node john-trx40-designare status is now: NodeHasSufficientMemory
49m         Normal    NodeHasNoDiskPressure     node/john-trx40-designare                       Node john-trx40-designare status is now: NodeHasNoDiskPressure
49m         Normal    NodeHasSufficientPID      node/john-trx40-designare                       Node john-trx40-designare status is now: NodeHasSufficientPID
49m         Normal    NodeNotReady              node/john-trx40-designare                       Node john-trx40-designare status is now: NodeNotReady
49m         Normal    TaintManagerEviction      pod/image-deployment-78c4c9fd7f-5vllb           Cancelling deletion of Pod default/image-deployment-78c4c9fd7f-5vllb
49m         Normal    TaintManagerEviction      pod/mysql-deployment-756f9d8cdf-lbfrg           Cancelling deletion of Pod default/mysql-deployment-756f9d8cdf-lbfrg
49m         Normal    TaintManagerEviction      pod/express-deployment-6dbb578fbb-dmjqs         Cancelling deletion of Pod default/express-deployment-6dbb578fbb-dmjqs
49m         Normal    TaintManagerEviction      pod/transcription-deployment-84fddcdff8-7sd9d   Cancelling deletion of Pod default/transcription-deployment-84fddcdff8-7sd9d
47m         Normal    NodeAllocatableEnforced   node/john-trx40-designare                       Updated Node Allocatable limit across pods
47m         Normal    NodeReady                 node/john-trx40-designare                       Node john-trx40-designare status is now: NodeReady
47m         Normal    SandboxChanged            pod/transcription-deployment-84fddcdff8-7sd9d   Pod sandbox changed, it will be killed and re-created.
47m         Normal    SandboxChanged            pod/image-deployment-78c4c9fd7f-5vllb           Pod sandbox changed, it will be killed and re-created.
47m         Normal    Pulled                    pod/transcription-deployment-84fddcdff8-7sd9d   Container image "localhost:32000/py_transcribe_service:20201226d" already present on machine
47m         Normal    Created                   pod/transcription-deployment-84fddcdff8-7sd9d   Created container transcription-container
47m         Normal    Started                   pod/transcription-deployment-84fddcdff8-7sd9d   Started container transcription-container
47m         Normal    SandboxChanged            pod/express-deployment-6dbb578fbb-dmjqs         Pod sandbox changed, it will be killed and re-created.
47m         Normal    SandboxChanged            pod/mysql-deployment-756f9d8cdf-lbfrg           Pod sandbox changed, it will be killed and re-created.
47m         Normal    CREATE                    ingress/ingress-myservice-jjg                   Ingress default/ingress-myservice-jjg
47m         Normal    UPDATE                    ingress/ingress-myservice-jjg                   Ingress default/ingress-myservice-jjg
47m         Normal    Pulled                    pod/image-deployment-78c4c9fd7f-5vllb           Container image "localhost:32000/py_image_service:2020126a" already present on machine
47m         Normal    Created                   pod/image-deployment-78c4c9fd7f-5vllb           Created container image-container
47m         Normal    Pulled                    pod/mysql-deployment-756f9d8cdf-lbfrg           Container image "mysql:5.7" already present on machine
47m         Normal    Created                   pod/mysql-deployment-756f9d8cdf-lbfrg           Created container mysql-container
47m         Normal    Started                   pod/image-deployment-78c4c9fd7f-5vllb           Started container image-container
47m         Normal    Started                   pod/mysql-deployment-756f9d8cdf-lbfrg           Started container mysql-container
47m         Normal    Pulled                    pod/express-deployment-6dbb578fbb-dmjqs         Container image "localhost:32000/express-server:20201226b" already present on machine
47m         Normal    Created                   pod/express-deployment-6dbb578fbb-dmjqs         Created container express
47m         Normal    Started                   pod/express-deployment-6dbb578fbb-dmjqs         Started container express
Run Code Online (Sandbox Code Playgroud)

下次更新:

Inspection-report/snap.microk8s.daemon-controller-manager 在节点重新启动并获取新 IP 地址之前似乎包含一堆问题。

有谁知道这意味着什么?

Jan 05 09:01:27 jo

gra*_*bag 0

https://github.com/ubuntu/microk8s/issues/2241提供了解决方案。

/var/snap/microk8s/2695/args$ nano kube-apiserver
Run Code Online (Sandbox Code Playgroud)

添加行

--bind-address=0.0.0.0
Run Code Online (Sandbox Code Playgroud)