I. *_*Shm 2 elasticsearch kubernetes
我正在尝试将 ELK 堆栈部署到我正在开发的 kubernetes 集群中。我似乎按照教程中描述的方式做了所有事情,但是,pod 不断因 Java 错误而失败(见下文)。我将描述从安装集群到出现错误的整个过程。
第1步:安装集群
# Apply sysctl params without reboot
cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sudo sysctl --system
#update and install apt https stuff
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
# add docker repo for containerd and install it
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y containerd.io
# copy config
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1 // somewhat redundant
net.bridge.bridge-nf-call-iptables = 1 // somewhat redundant
EOF
sudo sysctl --system
#install kubernetes binaries
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
#disable swap and comment swap in fstab
sudo swapoff -v /dev/mapper/main-swap
sudo nano /etc/fstab
#init cluster
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
#make user to kubectl admin
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
#install calico
kubectl apply -f
kubectl create -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
kubectl create -f https://docs.projectcalico.org/manifests/custom-resources.yaml
#untaint master node that pods can run on it
kubectl taint nodes --all node-role.kubernetes.io/master-
#install helm
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
Run Code Online (Sandbox Code Playgroud)
步骤2:安装ECK(https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-install-helm.html)和elasticsearch(https://github.com/elastic/helm ) -charts/blob/master/elasticsearch/README.md#installing )
# add helm repo
helm repo add elastic https://helm.elastic.co
helm repo update
# install eck
#### ommited as suggested in comment section!!!! helm install elastic-operator elastic/eck-operator -n elastic-system --create-namespace
helm install elasticsearch elastic/elasticsearch
Run Code Online (Sandbox Code Playgroud)
第 3 步:添加持久卷
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data1
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data1"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data2
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data2"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: elk-data3
labels:
type: local
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data3"
Run Code Online (Sandbox Code Playgroud)
应用它
sudo mkdir /mnt/data1
sudo mkdir /mnt/data2
sudo mkdir /mnt/data3
kubectl apply -f storage.yaml
Run Code Online (Sandbox Code Playgroud)
现在 Pod(或至少一个)可以运行了。但我不断收到 STATUS CrashLoopBackOff 日志中的 java 错误。
kubectl get pv,pvc,pods
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/elk-data1 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-1 140m
persistentvolume/elk-data2 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-2 140m
persistentvolume/elk-data3 30Gi RWO Retain Bound default/elasticsearch-master-elasticsearch-master-0 140m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-0 Bound elk-data3 30Gi RWO 141m
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-1 Bound elk-data1 30Gi RWO 141m
persistentvolumeclaim/elasticsearch-master-elasticsearch-master-2 Bound elk-data2 30Gi RWO 141m
NAME READY STATUS RESTARTS AGE
pod/elasticsearch-master-0 0/1 CrashLoopBackOff 32 141m
pod/elasticsearch-master-1 0/1 Pending 0 141m
pod/elasticsearch-master-2 0/1 Pending 0 141m
Run Code Online (Sandbox Code Playgroud)
日志和错误:
kubectl logs pod/elasticsearch-master-2
Exception in thread "main" java.lang.InternalError: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:65)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:48)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:279)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198)
at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:487)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1766)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:488)
at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140)
at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:558)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:263)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:207)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:220)
at org.apache.logging.log4j.core.config.Configurator.initialize(Configurator.java:197)
at org.elasticsearch.common.logging.LogConfigurator.configureStatusLogger(LogConfigurator.java:248)
at org.elasticsearch.common.logging.LogConfigurator.configureWithoutConfig(LogConfigurator.java:95)
at org.elasticsearch.cli.CommandLoggingConfigurator.configureLoggingWithoutConfig(CommandLoggingConfigurator.java:29)
at org.elasticsearch.cli.Command.main(Command.java:76)
at org.elasticsearch.common.settings.KeyStoreCli.main(KeyStoreCli.java:32)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:61)
... 26 more
Caused by: java.lang.ExceptionInInitializerError
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:107)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
... 31 more
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:208)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:260)
at java.base/java.nio.file.Path.of(Path.java:147)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:66)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:68)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:272)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:218)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setPath(CgroupV1Subsystem.java:201)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:173)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.lambda$initSubSystem$5(CgroupV1Subsystem.java:113)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:113)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
... 33 more
Exception in thread "main" java.lang.InternalError: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:65)
at java.base/jdk.internal.platform.Container.metrics(Container.java:43)
at jdk.management/com.sun.management.internal.OperatingSystemImpl.<init>(OperatingSystemImpl.java:48)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:279)
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198)
at java.management/sun.management.spi.PlatformMBeanProvider$PlatformComponent.getMBeans(PlatformMBeanProvider.java:195)
at java.management/java.lang.management.ManagementFactory.getPlatformMXBean(ManagementFactory.java:686)
at java.management/java.lang.management.ManagementFactory.getOperatingSystemMXBean(ManagementFactory.java:388)
at org.elasticsearch.tools.launchers.DefaultSystemMemoryInfo.<init>(DefaultSystemMemoryInfo.java:28)
at org.elasticsearch.tools.launchers.JvmOptionsParser.jvmOptions(JvmOptionsParser.java:125)
at org.elasticsearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:86)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:61)
... 10 more
Caused by: java.lang.ExceptionInInitializerError
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:107)
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167)
... 15 more
Caused by: java.lang.NullPointerException
at java.base/java.util.Objects.requireNonNull(Objects.java:208)
at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:260)
at java.base/java.nio.file.Path.of(Path.java:147)
at java.base/java.nio.file.Paths.get(Paths.java:69)
at java.base/jdk.internal.platform.CgroupUtil.lambda$readStringValue$1(CgroupUtil.java:66)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:554)
at java.base/jdk.internal.platform.CgroupUtil.readStringValue(CgroupUtil.java:68)
at java.base/jdk.internal.platform.CgroupSubsystemController.getStringValue(CgroupSubsystemController.java:65)
at java.base/jdk.internal.platform.CgroupSubsystemController.getLongValue(CgroupSubsystemController.java:124)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getLongValue(CgroupV1Subsystem.java:272)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.getHierarchical(CgroupV1Subsystem.java:218)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setPath(CgroupV1Subsystem.java:201)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.setSubSystemControllerPath(CgroupV1Subsystem.java:173)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.lambda$initSubSystem$5(CgroupV1Subsystem.java:113)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.initSubSystem(CgroupV1Subsystem.java:113)
at java.base/jdk.internal.platform.cgroupv1.CgroupV1Subsystem.<clinit>(CgroupV1Subsystem.java:47)
... 17 more
Run Code Online (Sandbox Code Playgroud)
来自 helm 图表的 value.yaml
---
clusterName: "elasticsearch"
nodeGroup: "master"
# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: ""
# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
roles:
master: "true"
ingest: "true"
data: "true"
remote_cluster_client: "true"
ml: "true"
replicas: 3
minimumMasterNodes: 2
esMajorVersion: ""
# Allows you to add any config files in /usr/share/elasticsearch/config/
# such as elasticsearch.yml and log4j2.properties
esConfig: {}
# elasticsearch.yml: |
# key:
# nestedkey: value
# log4j2.properties: |
# key = value
# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
# - name: MY_ENVIRONMENT_VAR
# value: the_value_goes_here
# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
# name: env-secret
# - configMapRef:
# name: config-map
# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []
# - name: elastic-certificates
# secretName: elastic-certificates
# path: /usr/share/elasticsearch/config/certs
# defaultMode: 0755
hostAliases: []
#- ip: "127.0.0.1"
# hostnames:
# - "foo.local"
# - "bar.local"
image: "docker.elastic.co/elasticsearch/elasticsearch"
imageTag: "7.12.1"
imagePullPolicy: "IfNotPresent"
podAnnotations: {}
# iam.amazonaws.com/role: es-cluster
# additionals labels
labels: {}
esJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "1000m"
memory: "2Gi"
limits:
cpu: "1000m"
memory: "2Gi"
initResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
sidecarResources: {}
# limits:
# cpu: "25m"
# # memory: "128Mi"
# requests:
# cpu: "25m"
# memory: "128Mi"
networkHost: "0.0.0.0"
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 30Gi
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
podSecurityPolicy:
create: false
name: ""
spec:
privileged: true
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeCla
您遇到的问题不是与 Elasticsearch 相关的问题。这是您所使用的 containerd 版本的 cgroup 配置导致的问题。我还没有解开具体细节,但 Elasticsearch 日志中的异常与 JDK 在尝试检索所需的 cgroup 信息时失败有关。
我遇到了同样的问题,并通过在安装 Kubernetes 之前执行以下步骤来解决它,安装更高版本的 Containerd 并将其配置为将 cgroups 与 systemd 一起使用:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
Run Code Online (Sandbox Code Playgroud)
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
Run Code Online (Sandbox Code Playgroud)
apt-get -y install containerd.io
Run Code Online (Sandbox Code Playgroud)
containerd config default > /etc/containerd/config.toml
Run Code Online (Sandbox Code Playgroud)
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
Run Code Online (Sandbox Code Playgroud)
systemctl restart containerd
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1821 次 |
| 最近记录: |