标签: prometheus

普罗米修斯的最大scrape_interval是多少

我使用Prometheus来衡量业务指标,如:

# HELP items_waiting_total Total number of items in a queue
# TYPE items_waiting_total gauge
items_waiting_total 149

Run Code Online (Sandbox Code Playgroud)

我想保留这些数据非常长期(保留5年),我不需要scrape_interval中的高频率.所以我成立了scrape_interval: "900s".

当我用60s分辨率检查普罗米修斯的图形时,它显示了拍打,但事实并非如此.

问题是,普罗米修斯的最大(推荐)scrape_interval是多少？

prometheus

oli*_*adj

lucky-day

10
推荐指数

1
解决办法

1857
查看次数

从Prometheus查询中的返回值生成范围向量

我有一个类型计数器的度量varnish_main_client_req,我想设置一个警报,如果请求率在给定时间内下降/上升一定量(例如"在最后2分钟内偏离的请求数量!"),则触发.

使用deriv()函数应该比比较相对值更好,但它只能用于仪表.是否有可能转换一个不断增加的指标又名.与额定指标相反.衡量？

查询:deriv(rate(varnish_main_client_req[2m])[5m])

期望:普罗米修斯计算过去2分钟内客户请求的比率,并返回过去5分钟内结果值的导数.

实际结果:

"error":"在char 48处解析错误:范围规范必须在指标选择器之前,但在*promql.Call后面"

录制规则可能是一个选项,但对于应该使用查询的内容,它感觉就像一个廉价的解决方法:

my_gauge_metric = rate(some_counter_metric[2m])

prometheus

Pau*_*oss

lucky-day

10
推荐指数

2
解决办法

6724
查看次数

每个目标的不同刮取URL

我的应用程序的每个实例都有不同的URL.如何配置prometheus.yml以便它获取目标的路径以及主机名？

scrape_configs:
- job_name:       'example-random'

# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s

static_configs:
  - targets: ['localhost:8090','localhost:8080']
    labels:
      group: 'dummy'

Run Code Online (Sandbox Code Playgroud)

prometheus

poo*_*abh

lucky-day

10
推荐指数

4
解决办法

1万
查看次数

如何自动测试Prometheus警报？

我们即将设置Prometheus用于监控和警报我们的云服务,包括Prometheus服务的连续集成和部署管道以及警报规则/阈值等配置.为此,我正在考虑3个类别,我想编写自动化测试:

部署期间配置的基本语法检查(我们已经使用promtool和amtool执行此操作)
在部署期间测试警报规则(导致警报的原因)
在部署期间测试警报路由(谁会收到有关什么的警报)
重复检查警报系统是否在生产中正常工作

我现在最重要的部分是测试警报规则(类别1),但我没有找到工具来做到这一点.我可以想象在部署期间设置一个Prometheus实例,为它提供一些公制样本(担心我将如何使用普罗米修斯的Pull架构？)然后针对它运行查询.

到目前为止我唯一发现的是一篇关于监控普罗米修斯Alertmanager连锁店整个与第三类相关的博客文章.

有没有人做过这样的事情,还是有什么我错过的？

continuous-integration alert integration-testing automated-tests prometheus

bsi*_*ngr

2017 10-06

10
推荐指数

1
解决办法

2938
查看次数

使用prometheus和alertmanager在松弛状态下不显示警报消息

我试图通过使用alertmanager获得Prometheus发现的警报以获得松弛通知.

这是alert.rules文件,工作正常

groups:
- name: Instances
  rules:
  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    # Prometheus templates apply here in the annotation and label fields of the alert.
    annotations:
      description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
      summary: 'Instance {{ $labels.instance }} down'

Run Code Online (Sandbox Code Playgroud)

它成功地显示了一个实例.

但是我的alertmanager.yml中的问题是它没有向slack发送通知.我也成功设置了松弛的webhook,甚至测试了钩子是否正常工作,同时创建了一个钩子与松弛提供的服务

alertmanager.yml

groups: …

Run Code Online (Sandbox Code Playgroud)

alert monitoring docker-compose prometheus

Tar*_*ung

2018 02-07

10
推荐指数

1
解决办法

1704
查看次数

使用 Prometheus 监控 CPU 利用率

我正在尝试监视安装并运行 Prometheus 的机器的 CPU 利用率。我有一个指标process_cpu_seconds_total。我可以找到这个指标的愤怒或比率。但我不太确定如何得出 CPU 利用率的百分比值。无论如何我可以使用这个process_cpu_seconds_total指标来找到运行 Prometheus 的机器的 CPU 利用率吗？

cpu-usage prometheus promql

Arn*_*ose

2021 07-16

10
推荐指数

2
解决办法

2万
查看次数

当 docker 容器 pod 出现错误或 CarshLoopBackOff kubernetes 时发出警报

我在 AWS 上设置了 kubernetes 集群，我尝试使用 cAdvisor + Prometheus + Alert manager 来监控多个 pod。如果容器 / pod 出现故障或卡在 Error 或 CarshLoopBackOff 状态或除了运行之外的任何其他状态，我想要做的是启动电子邮件警报（带有服务/容器名称）。

docker kubernetes prometheus prometheus-alertmanager

shi*_*455

lucky-day

10
推荐指数

1
解决办法

1万
查看次数

千分尺 - Prometheus Gauge 显示 NaN

我正在尝试使用 Micrometer.io 和 Spring Boot 2.0.0.RELEASE 来生成 Prometheus 指标。

当我尝试将 List 的大小公开为 Gauge 时，它一直显示 NaN。在文档中它说;

您有责任持有对您使用 Gauge 测量的状态对象的强引用。

我尝试了一些不同的方法，但我无法解决问题。这是我的一些试验代码。

import io.micrometer.core.instrument.*;
import io.swagger.backend.model.Product;
import io.swagger.backend.service.ProductService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;

import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;

@RestController
@RequestMapping("metrics")
public class ExampleController {

    private AtomicInteger atomicInteger = new AtomicInteger();

    private ProductService productService;
    private final Gauge productGauge;

    @Autowired
    public HelloController(ProductService productService,
                           MeterRegistry registry) {

        this.productService = productService;

        createGauge("product_gauge", productService.getProducts(), registry);
    }

    private void createGauge(String metricName, List<Product> products,
                                    MeterRegistry registry) {

        List<Product> products = productService.getProducts();

        // …

Run Code Online (Sandbox Code Playgroud)

java spring-boot spring-boot-actuator prometheus micrometer

Cem*_*nal

lucky-day

10
推荐指数

2
解决办法

6914
查看次数

Prometheus查询计算唯一标签值

我想计算一些独特的标签值.有一些像

select count (distinct a) from hello_info

Run Code Online (Sandbox Code Playgroud)

例如,如果我的度量标准'hello_info'具有标签a和b.我想要计算一个独特的数量.对于a ="1","2","3",计数将为3.

hello_info(a="1", b="ddd")
hello_info(a="2", b="eee")
hello_info(a="1", b="fff")
hello_info(a="3", b="ggg")

Run Code Online (Sandbox Code Playgroud)

grafana prometheus

emp*_*188

2018 08-17

10
推荐指数

3
解决办法

1万
查看次数

Prometheus scrape_timeout的使用

在普罗米修斯配置中，我有以下规格的工作：

- job_name: name_of_my_job
     scrape_interval: 5m
     scrape_timeout: 30s
     metrics_path: /metrics
     scheme: http

Run Code Online (Sandbox Code Playgroud)

创建指标的脚本需要 3 分钟才能完成，但从普罗米修斯我看不到指标。scrape_timeout变量的作用是什么？

configuration devops prometheus

Jus*_*lma

2020 04-02

10
推荐指数

1
解决办法

2万
查看次数

标签统计

prometheus ×10

alert ×2

automated-tests ×1

configuration ×1

continuous-integration ×1

cpu-usage ×1

devops ×1

docker ×1

docker-compose ×1

grafana ×1

integration-testing ×1

java ×1

kubernetes ×1

micrometer ×1

monitoring ×1

prometheus-alertmanager ×1

promql ×1

spring-boot ×1

spring-boot-actuator ×1

标签 统计

标签统计