使用Nomad启动Prometheus
我之前在前一次和前两次的操作中启用了Consul和Nomad的HTTP API的TLS。
这一次,我想要在Nomad上配置Prometheus,同时收集Consul和Nomad的指标数据。
概述
将 Prometheus 作为 Nomad 作业投入。
Prometheus 的配置文件不会故意放置在 Docker 镜像内部,而是会将其注册到 Consul KV,并在 Nomad 作业投入过程中同时放置到容器内部。
此外,我们将使用Prometheus来收集Consul和Nomad的指标,但由于之前已将Consul和Nomad进行了TLS加密,因此我们需要从Prometheus发出使用HTTPS协议获取指标的请求。让我们一起来看一下相关配置文件的编写方式吧。
顺便说一下,Nomad的工作应该尽可能地设计成无状态(不保持状态,也不持久化),但是对于Prometheus的指标信息,我们希望考虑一些持久化的方法。关于这个问题,我们计划在以后的讨论中进行处理。
请用中文进行原生的改写,只需要一种选项:
那我们继续前进吧。
取名为Prometheus的配置文件
首先,我们要创建一个用于收集Consul和Nomad指标的Prometheus配置文件。
由于Prometheus是使用Pull模式进行数据收集的,所以需要将监视端点配置到Prometheus中。
本次我们将使用Consul的服务发现功能,以获取Consul和Nomad正在运行的IP地址和端口。
「consul_sd_configs」是一项用于通过Consul获取监视端点的IP地址和端口的设置。
为了通过HTTPS访问Consul,我们通过指定「scheme: https」,并在「tls_config」中指定了各种证书和密钥。
在这里,我们决定将证书和密钥放置在Docker容器的/opt/cert目录中。
global:
scrape_interval: 5s
evaluation_interval: 5s
scrape_configs:
- job_name: 'consul_metrics'
scheme: https
tls_config:
ca_file: /opt/cert/consul-agent-ca.pem
cert_file: /opt/cert/dc1-cli-consul-0.pem
key_file: /opt/cert/dc1-cli-consul-0-key.pem
insecure_skip_verify: true
consul_sd_configs:
- server: 'consul.service.consul:8501'
services: ['consul']
scheme: https
tls_config:
ca_file: /opt/cert/consul-agent-ca.pem
cert_file: /opt/cert/dc1-cli-consul-0.pem
key_file: /opt/cert/dc1-cli-consul-0-key.pem
insecure_skip_verify: false
relabel_configs:
- source_labels: [__address__]
replacement: ${1}:8501
regex: ([^:]+):(\d+)
target_label: __address__
- source_labels: [__meta_consul_node]
target_label: node_name
scrape_interval: 5s
metrics_path: /v1/agent/metrics
params:
format: ['prometheus']
- job_name: 'nomad_metrics'
scheme: https
tls_config:
ca_file: /opt/cert/nomad-ca.pem
cert_file: /opt/cert/nomad-cli.pem
key_file: /opt/cert/nomad-cli-key.pem
insecure_skip_verify: true
consul_sd_configs:
- server: 'consul.service.consul:8501'
services: ['nomad-client', 'nomad']
scheme: https
tls_config:
ca_file: /opt/cert/consul-agent-ca.pem
cert_file: /opt/cert/dc1-cli-consul-0.pem
key_file: /opt/cert/dc1-cli-consul-0-key.pem
insecure_skip_verify: false
relabel_configs:
- source_labels: ['__meta_consul_tags']
regex: '(.*)http(.*)'
action: keep
- source_labels: [__meta_consul_node]
target_label: node_name
scrape_interval: 5s
metrics_path: /v1/metrics
params:
format: ['prometheus']
为了能够从Prometheus的Docker容器中通过名为”consul.service.consul”的DNS访问Consul,需要调整DNS配置。
在我的环境中,我在运行Nomad客户端的主机上使用dnsmasq来引用Consul DNS,并从Docker容器中通过host_network访问主机上的DNS。
请参考Nomad的官方文档。
由于Consul给出的Nomad IP地址与Nomad服务器证书中的CN或SAN不匹配,因此跳过服务器主机名验证以避免TLS错误。(nomad_metrics的insecure_skip_verify:true)consul_metrics的insecure_skip_verify:true也是一样的。
Nomad的配置文件
在Nomad的配置文件中添加公开指标的配置。有关详细信息,请参阅文档。
...
telemetry {
publish_allocation_metrics = true
publish_node_metrics = true
prometheus_metrics = true
}
...
游牧工作文件
我将创建一个适用于 Prometheus 的 Nomad 作业文件。
在下面的模板块中,从Consul KV中读取prometheus.yml、证书和密钥,并将其放置在Nomad作业所使用的本地目录中。
使用mount块将本地目录挂载到容器内部。
放置着证书和密钥文件的方式并不十分整洁,将其压缩成zip文件后上传至Web服务器或S3存储空间可能更加清爽方便。
更新、迁移、资源等项均为临时设置,需要根据运营设计和环境进行相应的配置。
job "prometheus" {
datacenters = ["dc1"]
type = "service"
update {
max_parallel = 1
min_healthy_time = "10s"
healthy_deadline = "3m"
progress_deadline = "10m"
auto_revert = false
canary = 0
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
group "prometheus" {
count = 1
restart {
attempts = 2
interval = "30m"
delay = "15s"
mode = "fail"
}
network {
mode = "host"
port "prometheus" {
static = 9090
}
}
service {
name = "prometheus"
port = "prometheus"
tags = ["opt"]
check {
type = "http"
protocol = "http"
path = "/-/healthy"
interval = "30s"
timeout = "2s"
}
}
task "prometheus" {
driver = "docker"
config {
logging {
type = "journald"
}
image = "prom/prometheus:v2.24.1"
ports = ["prometheus"]
network_mode = "host"
args = [
"--config.file=/srv/prometheus/prometheus.yml",
"--storage.tsdb.path=/prometheus",
"--web.enable-lifecycle"
]
mounts = [
{
type = "bind"
source = "local/prometheus.yml"
target = "/srv/prometheus/prometheus.yml"
},
{
type = "bind"
source = "local/cert"
target = "/opt/cert"
}
]
}
env {
}
template {
change_mode = "restart"
destination = "local/prometheus.yml"
data = "{{ key \"prometheus/config/prometheus.yml\" }}"
}
template {
change_mode = "noop"
destination = "local/cert/consul-agent-ca.pem"
data = "{{ key \"cert/consul/consul-agent-ca.pem\" }}"
}
template {
change_mode = "noop"
destination = "local/cert/dc1-cli-consul-0.pem"
data = "{{ key \"cert/consul/dc1-cli-consul-0.pem\" }}"
}
template {
change_mode = "noop"
destination = "local/cert/dc1-cli-consul-0-key.pem"
data = "{{ key \"cert/consul/dc1-cli-consul-0-key.pem\" }}"
}
template {
change_mode = "noop"
destination = "local/cert/nomad-ca.pem"
data = "{{ key \"cert/nomad/nomad-ca.pem\" }}"
}
template {
change_mode = "noop"
destination = "local/cert/nomad-cli.pem"
data = "{{ key \"cert/nomad/nomad-cli.pem\" }}"
}
template {
change_mode = "noop"
destination = "local/cert/nomad-cli-key.pem"
data = "{{ key \"cert/nomad/nomad-cli-key.pem\" }}"
}
resources {
cpu = 500 # MHz
memory = 1024 # MB
}
}
}
}
提交Consul KV和Nomad作业文件。
那么,现在让我们开始把本地文件上传到Consul KV或者Nomad吧。
提供下面的文件。请调整域名和路径。
export CONSUL_HTTP_ADDR=https://consul.mydomain.tk:8501
export CONSUL_CACERT=./consul-agent-ca.pem
export CONSUL_CLIENT_CERT=./dc1-cli-consul-0.pem
export CONSUL_CLIENT_KEY=./dc1-cli-consul-0-key.pem
export NOMAD_ADDR=https://nomad.mydomain.tk:4646
export NOMAD_CACERT=./nomad-ca.pem
export NOMAD_CLIENT_CERT=./nomad-cli.pem
export NOMAD_CLIENT_KEY=./nomad-cli-key.pem
#!/bin/bash
source ./env.sh
consul kv put prometheus/config/prometheus.yml @./prometheus.yml
consul kv put cert/consul/consul-agent-ca.pem @./consul-agent-ca.pem
consul kv put cert/consul/dc1-cli-consul-0.pem @./dc1-cli-consul-0.pem
consul kv put cert/consul/dc1-cli-consul-0-key.pem @./dc1-cli-consul-0-key.pem
consul kv put cert/nomad/nomad-ca.pem @./nomad-ca.pem
consul kv put cert/nomad/nomad-cli.pem @./nomad-cli.pem
consul kv put cert/nomad/nomad-cli-key.pem @./nomad-cli-key.pem
#!/bin/bash
source ./env.sh
nomad job run prometheus.nomad
执行Shell脚本并输入。
$ chmod +x *.sh
$ ./register_setting.sh
Success! Data written to: prometheus/config/prometheus.yml
Success! Data written to: cert/consul/consul-agent-ca.pem
Success! Data written to: cert/consul/dc1-cli-consul-0.pem
Success! Data written to: cert/consul/dc1-cli-consul-0-key.pem
Success! Data written to: cert/nomad/nomad-ca.pem
Success! Data written to: cert/nomad/nomad-cli.pem
Success! Data written to: cert/nomad/nomad-cli-key.pem
$
$ ./register_job.sh
==> Monitoring evaluation "8e6e815c"
Evaluation triggered by job "prometheus"
Evaluation within deployment: "679353d4"
Allocation "50ef43a1" created: node "ef1cba89", group "prometheus"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "8e6e815c" finished with status "complete"
在Consul或Nomad的管理UI界面上,确认Prometheus的作业已经启动。
你还可以使用Nomad CLI来检查Nomad作业的状态。
$ source ./env.sh
$ nomad job status prometheus
ID = prometheus
Name = prometheus
Submit Date = 2021-03-31T02:26:49Z
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
prometheus 0 0 1 0 0 0
Latest Deployment
ID = 679353d4
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
prometheus 1 1 1 0 2021-03-31T02:39:09Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
50ef43a1 ef1cba89 prometheus 0 run running 2m29s ago 10s ago
状态已变为运行,并且健康度为1。
请用中文将以下内容进行释义:
总结
这次我尝试使用Nomad作业来启动Prometheus。
此外,我还确认了从Consul的服务发现中获取监视端点的设置。
在本次设置中,Prometheus收集的度量信息将保存在Nomad作业的启动选项“–storage.tsdb.path”指定的位置,也就是Docker容器内的/prometheus目录中。
这样做的话,当Nomad作业停止时,度量信息将会与Docker容器同时丢失,因此下次我们应该调整配置以确保适当地持久化保存。
以下是中文本地化的一个可能选项:
<参考>
-
- https://learn.hashicorp.com/tutorials/nomad/prometheus-metrics
-
- https://www.nomadproject.io/docs/configuration/telemetry
- https://prometheus.io/docs/prometheus/latest/configuration/configuration/