caz*_*a19 3 kubernetes prometheus
问题:我在Kubernetes集群之外有一个Prometheus。因此,我想从远程群集中导出指标。
我从Prometheus Github存储库中获取了配置示例,并对此做了一些修改。所以,这是我的工作配置。
- job_name: 'kubernetes-apiservers'
scheme: http
kubernetes_sd_configs:
- role: endpoints
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;http
- job_name: 'kubernetes-nodes'
scheme: http
kubernetes_sd_configs:
- role: node
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-service-endpoints'
scheme: http
kubernetes_sd_configs:
- role: endpoints
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (http?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-services'
scheme: http
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_service_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
scheme: http
kubernetes_sd_configs:
- role: pod
api_server: http://cluster-manager.dev.example.net:8080
bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(?:\d+);(\d+)
replacement: ${1}:${2}
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
Run Code Online (Sandbox Code Playgroud)
我不使用TLS连接到API,所以我想禁用它。
当我卷曲/metricsPrometheus主机的URL时,它将打印它们。
最终,我连接到了群集,但是...工作没有完成,因此Prometheus不会公开重新标记的指标。
我在控制台中看到的内容。
目标状态:
我也检查了Prometheus调试。认为系统会获取任何必要的信息,并且请求已成功完成。
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"UDP\", \"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-local\"}, model.LabelSet{\"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp-local\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10055\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10055\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns\", \"__meta_kubernetes_pod_container_port_protocol\":\"UDP\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_number\":\"10054\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10054\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq-metrics\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:8080\", \"__meta_kubernetes_pod_container_name\":\"healthz\", \"__meta_kubernetes_pod_container_port_number\":\"8080\", \"__meta_kubernetes_pod_container_port_name\":\"\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"kube-system\\\",\\\"name\\\":\\\"kube-dns-2924299975\\\",\\\"uid\\\":\\\"fa808d95-d7d9-11e6-9ac9-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"89\\\"}}\\n\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_affinity\":\"{\\\"nodeAffinity\\\":{\\\"requiredDuringSchedulingIgnoredDuringExecution\\\":{\\\"nodeSelectorTerms\\\":[{\\\"matchExpressions\\\":[{\\\"key\\\":\\\"beta.kubernetes.io/arch\\\",\\\"operator\\\":\\\"In\\\",\\\"values\\\":[\\\"amd64\\\"]}]}]}}}\", \"__meta_kubernetes_pod_name\":\"kube-dns-2924299975-dksg5\", \"__meta_kubernetes_pod_ip\":\"10.32.0.2\", \"__meta_kubernetes_pod_label_k8s_app\":\"kube-dns\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"2924299975\", \"__meta_kubernetes_pod_label_tier\":\"node\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_tolerations\":\"[{\\\"key\\\":\\\"dedicated\\\",\\\"value\\\":\\\"master\\\",\\\"effect\\\":\\\"NoSchedule\\\"}]\", \"__meta_kubernetes_namespace\":\"kube-system\", \"__meta_kubernetes_pod_node_name\":\"cluster-manager.dev.example.net\", \"__meta_kubernetes_pod_label_component\":\"kube-dns\", \"__meta_kubernetes_pod_label_kubernetes_io_cluster_service\":\"true\", \"__meta_kubernetes_pod_host_ip\":\"54.194.166.39\", \"__meta_kubernetes_pod_label_name\":\"kube-dns\"}, Source:\"pod/kube-system/kube-dns-2924299975-dksg5\"}"
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__address__\":\"10.43.0.0\", \"__meta_kubernetes_pod_container_name\":\"bot\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_host_ip\":\"172.17.101.25\", \"__meta_kubernetes_pod_label_app\":\"bot\", \"__meta_kubernetes_namespace\":\"default\", \"__meta_kubernetes_pod_name\":\"bot-272181271-pnzsz\", \"__meta_kubernetes_pod_ip\":\"10.43.0.0\", \"__meta_kubernetes_pod_node_name\":\"ip-172-17-101-25\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"default\\\",\\\"name\\\":\\\"bot-272181271\\\",\\\"uid\\\":\\\"c297b3c2-e15d-11e6-a28a-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"1465127\\\"}}\\n\", \"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"272181271\", \"__meta_kubernetes_pod_label_version\":\"v0.1\"}, Source:\"pod/default/bot-272181271-pnzsz\"}"
Run Code Online (Sandbox Code Playgroud)
Prometheus会获取更新,但是...不会将其转换为指标。因此,我已经绞尽脑汁想出为什么会这样。因此,如果您能找出错误的地方,请提供帮助。
小智 7
如果要从外部Prometheus服务器监视Kubernetes集群,我建议设置一个Prometheus联合拓扑:
这是可扩展的解决方案。您可以添加任意数量的K8集群监视器,直到达到中心Prometheus的容量为止。然后,您可以添加另一个中心Prometheus实例来监视其他实例。
最后我发现,尽管在集群外部设置 Kubernetes 集群监控并不简单。原因 Kubernetes 架构建议将所有基础设施保留在一个本地网络内。因此,每一种解决方法都会很混乱。此外,我还遇到了一个问题,试图调试为什么所有有关 Kubernetes 角色(例如 nod、pod、服务和端点)的已配置作业甚至没有显示在目标状态页面中。我可能想错了,但我没有找到如何在 Prometheus 中调试这个问题。
我在外部监控 Kubernetes 集群的解决方案是kube-api-exporter。非常简单的 Python 脚本,它获取有关 ds、部署和 pod 的所有指标,并最终提供获取它们的 URL。因此,我建议所有遇到这种集成问题的人都采用这个解决方案。
我还开始从 etcd 获取指标。这很酷,etcd 提供了开箱即用的 Prometheus 风格的指标。
PS:感谢FuzzyAmi的帮助。
| 归档时间: |
|
| 查看次数: |
2795 次 |
| 最近记录: |