如何使用 Promtail / Loki - AlertManager 为我的日志中的每个错误发送警报？

Question

如何使用 Promtail / Loki - AlertManager 为我的日志中的每个错误发送警报？

EnT*_*nTm 11 prometheus prometheus-alertmanager grafana-loki

我正在使用 Promtail + Loki 来收集我的日志，但我不知道如何对日志文件中的每个错误发出警报。我也在使用 Prometheus、Alertmanager 和 Grafana。我看到有些人已经成功地做到了这一点，但他们都没有解释细节。需要明确的是，我不是在寻找处于 FIRING 状态的警报或具有“警报”状态的 Grafana 仪表板。我所需要的就是每次在我的一个日志中出现错误时都知道。如果不能完全以这种方式完成，下一个最佳解决方案是每 X 秒抓取一次，然后发出类似“6 条新错误消息”的警报。

Answer 1

Mag*_*Max -1

我也有同样的问题。

经过一番调查，我发现 AlertManager 只是接收警报并路由它们。如果您有一个服务可以将 Loki 搜索转换为对 AlertManager API 的调用，那么就完成了。也许您已经拥有其中两个。

我找到了这个线程： https: //github.com/grafana/loki/issues/1753

其中包含此视频：https://www.youtube.com/watch ?v=GdgX46KwKqo

选项 1：使用 grafana

他们展示了如何通过 Grafana 中的搜索创建警报。如果您只是添加类型为“Prometheus Alertmanager”的警报通知通道，您就会得到它。

因此，Grafana 将触发警报，Prometheus-AlertManager 将对其进行管理。

选项 2：使用 promtail

还有其他方法：添加 promtailpipeline_stage以便通过搜索创建 Prometheus 指标并像任何其他指标一样管理它：只需添加 Prometheus 警报并从 AlertManager 进行管理即可。

您可以阅读前面链接中的示例：

pipeline_stages:
  - match:
      selector: '{app="promtail"} |= "panic"'
  - metrics:
      panic_total:
        type: Counter
        description: "total number of panic"
        config:
          match_all: true
          action: inc

Run Code Online (Sandbox Code Playgroud)

并且您将像往常一样管理警报的普罗米修斯指标。

归档时间：	5 年，7 月前
查看次数：	3910 次
最近记录：	4 年，9 月前