从哪里获取 Prometheus 中失败的规则?

Kok*_*zzu 5 kubernetes prometheus prometheus-alertmanager

我收到此警报:

\n
Alert:  PrometheusRuleFailures  - critical Description:  Prometheus monitoring/prometheus-prometheus-kube-prometheus-prometheus-0 has failed to evaluate 30 rules in the last 5m. Details:\n  \xe2\x80\xa2 alertname: PrometheusRuleFailures\n  \xe2\x80\xa2 container: prometheus\n  \xe2\x80\xa2 endpoint: web\n  \xe2\x80\xa2 instance: 10.244.0.159:9090\n  \xe2\x80\xa2 job: prometheus-kube-prometheus-prometheus\n  \xe2\x80\xa2 namespace: monitoring\n  \xe2\x80\xa2 pod: prometheus-prometheus-kube-prometheus-prometheus-0\n  \xe2\x80\xa2 prometheus: monitoring/prometheus-kube-prometheus-prometheus\n  \xe2\x80\xa2 rule_group: /etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0/monitoring-prometheus-kube-prometheus-kubelet.rules.yaml;kubelet.rules\n  \xe2\x80\xa2 service: prometheus-kube-prometheus-prometheus\n  \xe2\x80\xa2 severity: critical\n
Run Code Online (Sandbox Code Playgroud)\n

但是当我尝试从 Pod 获取日志时,它没有显示相关错误(仅警告和信息)

\n
level=warn ts=2021-05-04T13:36:57.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.5, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.5\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:37:02.027Z caller=manager.go:601 component="rule manager" group=kubernetes-system-kubelet msg="Evaluating rule failed" rule="alert: KubeletPodStartUpLatencyHigh\\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pod_worker_duration_seconds_bucket{job=\\"kubelet\\",metrics_path=\\"/metrics\\"}[5m])))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"}\\n > 60\\nfor: 15m\\nlabels:\\n severity: warning\\nannotations:\\n description: Kubelet Pod startup 99th percentile latency is {{ $value }} seconds\\n on node {{ $labels.node }}.\\n runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh\\n summary: Kubelet Pod startup latency is too high.\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:37:27.985Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.99\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:37:27.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.9, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.9\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:37:27.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.5, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.5\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:37:32.026Z caller=manager.go:601 component="rule manager" group=kubernetes-system-kubelet msg="Evaluating rule failed" rule="alert: KubeletPodStartUpLatencyHigh\\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pod_worker_duration_seconds_bucket{job=\\"kubelet\\",metrics_path=\\"/metrics\\"}[5m])))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"}\\n > 60\\nfor: 15m\\nlabels:\\n severity: warning\\nannotations:\\n description: Kubelet Pod startup 99th percentile latency is {{ $value }} seconds\\n on node {{ $labels.node }}.\\n runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh\\n summary: Kubelet Pod startup latency is too high.\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:37:57.985Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.99\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:37:57.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.9, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.9\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:37:57.987Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.5, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.5\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:38:02.028Z caller=manager.go:601 component="rule manager" group=kubernetes-system-kubelet msg="Evaluating rule failed" rule="alert: KubeletPodStartUpLatencyHigh\\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pod_worker_duration_seconds_bucket{job=\\"kubelet\\",metrics_path=\\"/metrics\\"}[5m])))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"}\\n > 60\\nfor: 15m\\nlabels:\\n severity: warning\\nannotations:\\n description: Kubelet Pod startup 99th percentile latency is {{ $value }} seconds\\n on node {{ $labels.node }}.\\n runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletpodstartuplatencyhigh\\n summary: Kubelet Pod startup latency is too high.\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:38:27.985Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.99, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.99\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:38:27.986Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.9, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.9\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\nlevel=warn ts=2021-05-04T13:38:27.987Z caller=manager.go:601 component="rule manager" group=kubelet.rules msg="Evaluating rule failed" rule="record: node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile\\nexpr: histogram_quantile(0.5, sum by(instance, le) (rate(kubelet_pleg_relist_duration_seconds_bucket[5m]))\\n * on(instance) group_left(node) kubelet_node_name{job=\\"kubelet\\",metrics_path=\\"/metrics\\"})\\nlabels:\\n quantile: \\"0.5\\"\\n" err="found duplicate series for the match group {instance=\\"209.151.158.125:10250\\"} on the right hand-side of the operation: [{__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-operator-kubelet\\"}, {__name__=\\"kubelet_node_name\\", endpoint=\\"https-metrics\\", instance=\\"209.151.158.125:10250\\", job=\\"kubelet\\", metrics_path=\\"/metrics\\", namespace=\\"kube-system\\", node=\\"cyza-node6\\", service=\\"prometheus-kube-prometheus-kubelet\\"}];many-to-many matching not allowed: matching labels must be unique on one side"\n
Run Code Online (Sandbox Code Playgroud)\n

我在哪里可以获得哪些(这 30 条)失败的规则?\n(我正在使用prometheus-kube-stack

\n

Roh*_*lik 4

最好的起点是 Prometheus UI 的规则页面 (:9090/rules)。
它将显示特定规则的错误。