Kubernetes Helm Elasticstack CrashLoopBackOff 日志中存在 JavaErrors

I. *_*Shm 2 elasticsearch kubernetes

我正在尝试将 ELK 堆栈部署到我正在开发的 kubernetes 集群中。我似乎按照教程中描述的方式做了所有事情,但是,pod 不断因 Java 错误而失败(见下文)。我将描述从安装集群到出现错误的整个过程。

第1步:安装集群

# Apply sysctl params without reboot

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

sudo sysctl --system
#update and install apt https stuff
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
# add docker repo for containerd and install it
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
 echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y containerd.io

# copy config
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1 // somewhat redundant
net.bridge.bridge-nf-call-iptables = 1 // somewhat redundant
EOF
sudo sysctl --system

#install kubernetes binaries
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

#disable swap and comment swap in fstab
sudo swapoff -v /dev/mapper/main-swap
sudo nano /etc/fstab

#init cluster
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
#make user to kubectl admin
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

#install calico
kubectl apply -f 
kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml
    
#untaint master node that pods can run on it
kubectl taint nodes --all node-role.kubernetes.io/master-

#install helm
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
Run Code Online (Sandbox Code Playgroud)

步骤2:安装ECK(https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-install-helm.html)和elasticsearch(https://github.com/elastic/helm ) -charts/blob/master/elasticsearch/README.md#installing )

# add helm repo
helm repo add elastic https://helm.elastic.co
helm repo update
# install eck
####  ommited as suggested in comment section!!!! helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace
helm install elasticsearch elastic/elasticsearch
Run Code Online (Sandbox Code Playgroud)

第 3 步:添加持久卷

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: elk-data1
  labels:
    type: local
spec:
  capacity:
    storage: 30Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data1"
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: elk-data2
  labels:
    type: local
spec:
  capacity:
    storage: 30Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data2"
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: elk-data3
  labels:
    type: local
spec:
  capacity:
    storage: 30Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/mnt/data3"
Run Code Online (Sandbox Code Playgroud)

应用它

sudo mkdir /mnt/data1
sudo mkdir /mnt/data2
sudo mkdir /mnt/data3
kubectl apply -f storage.yaml
Run Code Online (Sandbox Code Playgroud)

现在 Pod(或至少一个)可以运行了。但我不断收到 STATUS CrashLoopBackOff 日志中的 java 错误。

kubectl get pv,pvc,pods

NAME                         CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                 STORAGECLASS   REASON   AGE
persistentvolume/elk-data1   30Gi       RWO            Retain           Bound    default/elasticsearch-master-elasticsearch-master-1                           140m
persistentvolume/elk-data2   30Gi       RWO            Retain           Bound    default/elasticsearch-master-elasticsearch-master-2                           140m
persistentvolume/elk-data3   30Gi       RWO            Retain           Bound    default/elasticsearch-master-elasticsearch-master-0                           140m

NAME                                                                STATUS   VOLUME      CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-0   Bound    elk-data3   30Gi       RWO                           141m
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-1   Bound    elk-data1   30Gi       RWO                           141m
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2   Bound    elk-data2   30Gi       RWO                           141m

NAME                         READY   STATUS             RESTARTS   AGE
pod/elasticsearch-master-0   0/1     CrashLoopBackOff   32         141m
pod/elasticsearch-master-1   0/1     Pending            0          141m
pod/elasticsearch-master-2   0/1     Pending            0          141m
Run Code Online (Sandbox Code Playgroud)

日志和错误:

kubectl logs pod/elasticsearch-master-2


Exception in thread "main" java.lang.InternalError: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:65)
        at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
        at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:48)
        at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:279)
        at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198)
        at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:487)
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1766)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:488)
        at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140)
        at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:558)
        at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:263)
        at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:207)
        at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:220)
        at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:197)
        at org.elasticsearch.common.logging.LogConfigurator.configureStatusLogger(LogConfigurator.java:248)
        at org.elasticsearch.common.logging.LogConfigurator.configureWithoutConfig(LogConfigurator.java:95)
        at org.elasticsearch.cli.CommandLoggingConfigurator.configureLoggingWithoutConfig(CommandLoggingConfigurator.java:29)
        at org.elasticsearch.cli.Command.main(Command.java:76)
        at org.elasticsearch.common.settings.KeyStoreCli.main(KeyStoreCli.java:32)
Caused by: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:567)
        at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:61)
        ... 26 more
Caused by: java.lang.ExceptionInInitializerError
        at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:107)
        at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
        ... 31 more
Caused by: java.lang.NullPointerException
        at java.base/java.util.Objects.requireNonNull(Objects.java:208)
        at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:260)
        at java.base/java.nio.file.Path.of(Path.java:147)
        at java.base/java.nio.file.Paths.get(Paths.java:69)
        at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:66)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
        at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:68)
        at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
        at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:272)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:218)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setPath(CgroupV1Subsystem.java:201)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:173)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.lambda$initSubSystem$5(CgroupV1Subsystem.java:113)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:113)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
        ... 33 more
Exception in thread "main" java.lang.InternalError: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:65)
        at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
        at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:48)
        at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:279)
        at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198)
        at java.management/sun.management.spi.PlatformMBeanProvider$PlatformComponent.getMBeans(PlatformMBeanProvider.java:195)
        at java.management/java.lang.management.ManagementFactory.getPlatformMXBean(ManagementFactory.java:686)
        at java.management/java.lang.management.ManagementFactory.getOperatingSystemMXBean(ManagementFactory.java:388)
        at org.elasticsearch.tools.launchers.DefaultSystemMemoryInfo.<init>(DefaultSystemMemoryInfo.java:28)
        at org.elasticsearch.tools.launchers.JvmOptionsParser.jvmOptions(JvmOptionsParser.java:125)
        at org.elasticsearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:86)
Caused by: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:567)
        at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:61)
        ... 10 more
Caused by: java.lang.ExceptionInInitializerError
        at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:107)
        at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
        ... 15 more
Caused by: java.lang.NullPointerException
        at java.base/java.util.Objects.requireNonNull(Objects.java:208)
        at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:260)
        at java.base/java.nio.file.Path.of(Path.java:147)
        at java.base/java.nio.file.Paths.get(Paths.java:69)
        at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:66)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
        at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:68)
        at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
        at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:272)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:218)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setPath(CgroupV1Subsystem.java:201)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:173)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.lambda$initSubSystem$5(CgroupV1Subsystem.java:113)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
        at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:113)
        at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
        ... 17 more
Run Code Online (Sandbox Code Playgroud)

来自 helm 图表的 value.yaml

---
clusterName: "elasticsearch"
nodeGroup: "master"

# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""

# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
  master: "true"
  ingest: "true"
  data: "true"
  remote_cluster_client: "true"
  ml: "true"

replicas: 3
minimumMasterNodes: 2

esMajorVersion: ""

# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
#  elasticsearch.yml: |
#    key:
#      nestedkey: value
#  log4j2.properties: |
#    key = value

# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
#  - name: MY_ENVIRONMENT_VAR
#    value: the_value_goes_here

# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
#     name: env-secret
# - configMapRef:
#     name: config-map

# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
#  - name: elastic-certificates
#    secretName: elastic-certificates
#    path: /usr/share/elasticsearch/config/certs
#    defaultMode: 0755

hostAliases: []
#- ip: "127.0.0.1"
#  hostnames:
#  - "foo.local"
#  - "bar.local"

image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.12.1"
imagePullPolicy: "IfNotPresent"

podAnnotations: {}
  # iam.amazonaws.com/role: es-cluster

# additionals labels
labels: {}

esJavaOpts: "-Xmx1g -Xms1g"

resources:
  requests:
    cpu: "1000m"
    memory: "2Gi"
  limits:
    cpu: "1000m"
    memory: "2Gi"

initResources: {}
  # limits:
  #   cpu: "25m"
  #   # memory: "128Mi"
  # requests:
  #   cpu: "25m"
  #   memory: "128Mi"

sidecarResources: {}
  # limits:
  #   cpu: "25m"
  #   # memory: "128Mi"
  # requests:
  #   cpu: "25m"
  #   memory: "128Mi"

networkHost: "0.0.0.0"

volumeClaimTemplate:
  accessModes: [ "ReadWriteOnce" ]
  resources:
    requests:
      storage: 30Gi

rbac:
  create: false
  serviceAccountAnnotations: {}
  serviceAccountName: ""

podSecurityPolicy:
  create: false
  name: ""
  spec:
    privileged: true
    fsGroup:
      rule: RunAsAny
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
      - secret
      - configMap
      - persistentVolumeCla

Mar*_*ann 5

您遇到的问题不是与 Elasticsearch 相关的问题。这是您所使用的 containerd 版本的 cgroup 配置导致的问题。我还没有解开具体细节,但 Elasticsearch 日志中的异常与 JDK 在尝试检索所需的 cgroup 信息时失败有关。

我遇到了同样的问题,并通过在安装 Kubernetes 之前执行以下步骤来解决它,安装更高版本的 Containerd 并将其配置为将 cgroups 与 systemd 一起使用:

  1. 添加官方 Docker 存储库的 GPG 密钥。
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
Run Code Online (Sandbox Code Playgroud)
  1. 将 Docker 存储库添加到 APT 源。
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
Run Code Online (Sandbox Code Playgroud)
  1. 安装最新的containerd.io 软件包而不是Ubuntu 中的containerd 软件包。
apt-get -y install containerd.io
Run Code Online (Sandbox Code Playgroud)
  1. 生成默认的containerd配置。
containerd config default > /etc/containerd/config.toml
Run Code Online (Sandbox Code Playgroud)
  1. 配置containerd以使用systemd来管理cgroup。
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
     runtime_type = "io.containerd.runc.v2"
     runtime_engine = ""
     runtime_root = ""
     privileged_without_host_devices = false
     base_runtime_spec = ""
     [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true
Run Code Online (Sandbox Code Playgroud)
  1. 重启containerd服务。
systemctl restart containerd
Run Code Online (Sandbox Code Playgroud)