如何使用 Prometheus Alert Manager 在 Kubernetes 中触发警报

Question

如何使用 Prometheus Alert Manager 在 Kubernetes 中触发警报

Jib*_*eeb 5 kubernetes prometheus prometheus-alertmanager

我已经在我的集群中设置了 kube-prometheus（https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus）。它包含一些默认警报，如“CoreDNSdown 等”。如何创建我自己的警报？

任何人都可以为我提供示例示例来创建将向我的 Gmail 帐户发送电子邮件的警报吗？

当 docker container pod is in Error 或 CarshLoopBackOff kubernetes 时，我遵循了这个警报。但我无法让它发挥作用。

Answer 1

Pra*_*dha 5

要将警报发送到您的 Gmail 帐户，您需要在文件中设置警报管理器配置，例如 alertmanager.yaml：

cat <<EOF > alertmanager.yml
route:
  group_by: [Alertname]
  # Send all notifications to me.
  receiver: email-me

receivers:
- name: email-me
  email_configs:
  - to: $GMAIL_ACCOUNT
    from: $GMAIL_ACCOUNT
    smarthost: smtp.gmail.com:587
    auth_username: "$GMAIL_ACCOUNT"
    auth_identity: "$GMAIL_ACCOUNT"
    auth_password: "$GMAIL_AUTH_TOKEN"
EOF

Run Code Online (Sandbox Code Playgroud)

现在，当你使用KUBE-普罗米修斯所以你将有一个秘密的命名alertmanager-main是默认配置alertmanager。您需要alertmanager-main使用以下命令使用新配置再次创建机密：

kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring

Run Code Online (Sandbox Code Playgroud)

现在，您已将 alertmanager 设置为在收到来自普罗米修斯的警报时发送电子邮件。

现在您需要设置一个警报，您的邮件将被发送到该警报。您可以设置在任何情况下都会触发的 DeadManSwitch 警报，它用于检查您的警报管道

groups:
- name: meta
  rules:
    - alert: DeadMansSwitch
      expr: vector(1)
      labels:
        severity: critical
      annotations:
        description: This is a DeadMansSwitch meant to ensure that the entire Alerting
          pipeline is functional.
        summary: Alerting DeadMansSwitch

Run Code Online (Sandbox Code Playgroud)

之后，DeadManSwitch警报将被触发，并应将电子邮件发送到您的邮箱。

参考链接：

https://coreos.com/tectonic/docs/latest/tectonic-prometheus-operator/user-guides/configuring-prometheus-alertmanager.html

编辑：

deadmanswitch 警报应该在您的普罗米修斯正在读取的配置映射中。我将在这里分享我的普罗米修斯的相关快照：

"spec": {
        "alerting": {
            "alertmanagers": [
                {
                    "name": "alertmanager-main",
                    "namespace": "monitoring",
                    "port": "web"
                }
            ]
        },
        "baseImage": "quay.io/prometheus/prometheus",
        "replicas": 2,
        "resources": {
            "requests": {
                "memory": "400Mi"
            }
        },
        "ruleSelector": {
            "matchLabels": {
                "prometheus": "prafull",
                "role": "alert-rules"
            }
        },

Run Code Online (Sandbox Code Playgroud)

上面的配置是我的 prometheus.json 文件，它具有要使用的警报管理器的名称，它将ruleSelector根据prometheus和role标签选择规则。所以我有我的规则配置映射：

kind: ConfigMap
apiVersion: v1
metadata:
  name: prometheus-rules
  namespace: monitoring
  labels:
    role: alert-rules
    prometheus: prafull
data:
  alert-rules.yaml: |+
   groups:
   - name: alerting_rules
     rules:
       - alert: LoadAverage15m
         expr: node_load15 >= 0.50
         labels:
           severity: major
         annotations:
           summary: "Instance {{ $labels.instance }} - high load average"
           description: "{{ $labels.instance  }} (measured by {{ $labels.job }}) has high load average ({{ $value }}) over 15 minutes."

Run Code Online (Sandbox Code Playgroud)

替换DeadManSwitch上面的配置映射。

归档时间：	6 年，9 月前
查看次数：	5494 次
最近记录：	6 年，9 月前