使用 docker-compose 进行 Prometheus 服务发现

Sha*_*ulu 5 monitoring metrics docker docker-compose prometheus

我有以下 docker-compose 文件:

version: '3.4'

services:
    serviceA:
        image: <image>
        command: <command>
        labels:
           servicename: "service-A"
        ports:
         - "8080:8080"

    serviceB:
        image: <image>
        command: <command>
        labels:
           servicename: "service-B"
        ports:
         - "8081:8081"

    prometheus:
        image: prom/prometheus:v2.32.1
        container_name: prometheus
        volumes:
          - ./prometheus:/etc/prometheus
          - prometheus_data:/prometheus
        command:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus'
          - '--web.console.libraries=/etc/prometheus/console_libraries'
          - '--web.console.templates=/etc/prometheus/consoles'
          - '--storage.tsdb.retention.time=200h'
          - '--web.enable-lifecycle'
        restart: unless-stopped
        expose:
          - 9090

        labels:
          org.label-schema.group: "monitoring"

volumes:
    prometheus_data: {}
Run Code Online (Sandbox Code Playgroud)

docker-compose 还包含具有以下配置的 Prometheus 实例:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.


scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090', 'serviceA:8080', 'serviceB:8081']
Run Code Online (Sandbox Code Playgroud)

ServiceA 和 ServiceB 公开 prometheus 指标(每个指标都在其自己的端口上)。

当每项服务都有一个实例时,一切正常,但当我想要扩展服务并运行多个实例时,普罗米修斯指标收集开始弄乱指标收集,并且数据已损坏。

我为这个问题寻找了 docker-compose 服务发现,但没有找到合适的。我该如何解决这个问题?

ane*_*yte 10

这个问题的解决方案是使用实际的服务发现而不是静态目标。这样,Prometheus 将在每次迭代期间抓取每个副本。

如果只是 docker-compose (我的意思不是 Swarm),您可以使用 DNS 服务发现(dns_sd_config)来获取属于某个服务的所有 IP:

# docker-compose.yml
version: "3"
services:
  prometheus:
    image: prom/prometheus

  test-service:  # <- this
    image: nginx
    deploy:
      replicas: 3
---
# prometheus.yml
scrape_configs:
  - job_name: test
    dns_sd_configs:
      - names:
          - test-service  # goes here
        type: A
        port: 80
Run Code Online (Sandbox Code Playgroud)

这是启动和运行最简单的方法。

接下来,您可以使用专用的 Docker 服务发现:docker_sd_config。除了目标列表之外,它还为您提供更多标签数据(例如容器名称、镜像版本等),但它还需要连接到 Docker 守护进程才能获取这些数据。在我看来,这对于开发环境来说是一种矫枉过正,但在生产中可能是必不可少的。这是一个示例配置,大胆地从https://github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-docker.yml复制粘贴:

# A example scrape configuration for running Prometheus with Docker.

scrape_configs:
  # Make Prometheus scrape itself for metrics.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # Create a job for Docker daemon.
  #
  # This example requires Docker daemon to be configured to expose
  # Prometheus metrics, as documented here:
  # https://docs.docker.com/config/daemon/prometheus/
  - job_name: "docker"
    static_configs:
      - targets: ["localhost:9323"]

  # Create a job for Docker Swarm containers.
  #
  # This example works with cadvisor running using:
  # docker run --detach --name cadvisor -l prometheus-job=cadvisor
  #     --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,ro
  #     --mount type=bind,src=/,dst=/rootfs,ro
  #     --mount type=bind,src=/var/run,dst=/var/run
  #     --mount type=bind,src=/sys,dst=/sys,ro
  #     --mount type=bind,src=/var/lib/docker,dst=/var/lib/docker,ro
  #     google/cadvisor -docker_only
  - job_name: "docker-containers"
    docker_sd_configs:
      - host: unix:///var/run/docker.sock # You can also use http/https to connect to the Docker daemon.
    relabel_configs:
      # Only keep containers that have a `prometheus-job` label.
      - source_labels: [__meta_docker_container_label_prometheus_job]
        regex: .+
        action: keep
      # Use the task labels that are prefixed by `prometheus-`.
      - regex: __meta_docker_container_label_prometheus_(.+)
        action: labelmap
        replacement: $1
Run Code Online (Sandbox Code Playgroud)

最后,显然,还有dockerswarm_sd_config与 Docker Swarm 一起使用。这是三者中最复杂的,因此有一个全面的官方设置指南。与它一样,docker_sd_config它在标签中包含有关容器的附加信息,甚至更多(例如,它可以告诉容器在哪个节点上)。这里提供了示例配置: https: //github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-dockerswarm.yml,但是您应该真正阅读文档才能理解它并进行调整为自己。