使用Nomad启动Prometheus

我之前在前一次和前两次的操作中启用了Consul和Nomad的HTTP API的TLS。
这一次,我想要在Nomad上配置Prometheus,同时收集Consul和Nomad的指标数据。

概述

将 Prometheus 作为 Nomad 作业投入。
Prometheus 的配置文件不会故意放置在 Docker 镜像内部,而是会将其注册到 Consul KV,并在 Nomad 作业投入过程中同时放置到容器内部。

此外,我们将使用Prometheus来收集Consul和Nomad的指标,但由于之前已将Consul和Nomad进行了TLS加密,因此我们需要从Prometheus发出使用HTTPS协议获取指标的请求。让我们一起来看一下相关配置文件的编写方式吧。

顺便说一下,Nomad的工作应该尽可能地设计成无状态(不保持状态,也不持久化),但是对于Prometheus的指标信息,我们希望考虑一些持久化的方法。关于这个问题,我们计划在以后的讨论中进行处理。

请用中文进行原生的改写,只需要一种选项:

那我们继续前进吧。

取名为Prometheus的配置文件

首先,我们要创建一个用于收集Consul和Nomad指标的Prometheus配置文件。

由于Prometheus是使用Pull模式进行数据收集的,所以需要将监视端点配置到Prometheus中。
本次我们将使用Consul的服务发现功能,以获取Consul和Nomad正在运行的IP地址和端口。

「consul_sd_configs」是一项用于通过Consul获取监视端点的IP地址和端口的设置。
为了通过HTTPS访问Consul,我们通过指定「scheme: https」,并在「tls_config」中指定了各种证书和密钥。
在这里,我们决定将证书和密钥放置在Docker容器的/opt/cert目录中。

global:
  scrape_interval:     5s
  evaluation_interval: 5s
scrape_configs:
  - job_name: 'consul_metrics'
    scheme: https
    tls_config:
      ca_file: /opt/cert/consul-agent-ca.pem
      cert_file: /opt/cert/dc1-cli-consul-0.pem
      key_file:  /opt/cert/dc1-cli-consul-0-key.pem
      insecure_skip_verify: true
    consul_sd_configs:
    - server: 'consul.service.consul:8501'
      services: ['consul']
      scheme: https
      tls_config:
        ca_file: /opt/cert/consul-agent-ca.pem
        cert_file: /opt/cert/dc1-cli-consul-0.pem
        key_file:  /opt/cert/dc1-cli-consul-0-key.pem
        insecure_skip_verify: false
    relabel_configs:
    - source_labels: [__address__]
      replacement: ${1}:8501
      regex: ([^:]+):(\d+)
      target_label: __address__ 
    - source_labels: [__meta_consul_node]
      target_label: node_name
    scrape_interval: 5s
    metrics_path: /v1/agent/metrics
    params:
      format: ['prometheus']

  - job_name: 'nomad_metrics'
    scheme: https
    tls_config:
      ca_file: /opt/cert/nomad-ca.pem
      cert_file: /opt/cert/nomad-cli.pem
      key_file:  /opt/cert/nomad-cli-key.pem
      insecure_skip_verify: true
    consul_sd_configs:
    - server: 'consul.service.consul:8501'
      services: ['nomad-client', 'nomad']
      scheme: https
      tls_config:
        ca_file: /opt/cert/consul-agent-ca.pem
        cert_file: /opt/cert/dc1-cli-consul-0.pem
        key_file:  /opt/cert/dc1-cli-consul-0-key.pem
        insecure_skip_verify: false
    relabel_configs:
    - source_labels: ['__meta_consul_tags']
      regex: '(.*)http(.*)'
      action: keep
    - source_labels: [__meta_consul_node]
      target_label: node_name
    scrape_interval: 5s
    metrics_path: /v1/metrics
    params:
      format: ['prometheus']

为了能够从Prometheus的Docker容器中通过名为”consul.service.consul”的DNS访问Consul,需要调整DNS配置。
在我的环境中,我在运行Nomad客户端的主机上使用dnsmasq来引用Consul DNS,并从Docker容器中通过host_network访问主机上的DNS。
请参考Nomad的官方文档。

由于Consul给出的Nomad IP地址与Nomad服务器证书中的CN或SAN不匹配,因此跳过服务器主机名验证以避免TLS错误。(nomad_metrics的insecure_skip_verify:true)consul_metrics的insecure_skip_verify:true也是一样的。

Nomad的配置文件

在Nomad的配置文件中添加公开指标的配置。有关详细信息,请参阅文档。

...
telemetry {
  publish_allocation_metrics = true
  publish_node_metrics       = true
  prometheus_metrics         = true
}
...

游牧工作文件

我将创建一个适用于 Prometheus 的 Nomad 作业文件。

在下面的模板块中,从Consul KV中读取prometheus.yml、证书和密钥,并将其放置在Nomad作业所使用的本地目录中。
使用mount块将本地目录挂载到容器内部。

放置着证书和密钥文件的方式并不十分整洁,将其压缩成zip文件后上传至Web服务器或S3存储空间可能更加清爽方便。

更新、迁移、资源等项均为临时设置,需要根据运营设计和环境进行相应的配置。

job "prometheus" {
  datacenters = ["dc1"]
  type = "service"
  update {
    max_parallel = 1
    min_healthy_time = "10s"
    healthy_deadline = "3m"
    progress_deadline = "10m"
    auto_revert = false
    canary = 0
  }
  migrate {
    max_parallel = 1
    health_check = "checks"
    min_healthy_time = "10s"
    healthy_deadline = "5m"
  }
  group "prometheus" {
    count = 1
    restart {
      attempts = 2
      interval = "30m"
      delay = "15s"
      mode = "fail"
    }
    network { 
      mode = "host"
      port "prometheus" {
        static = 9090
      }
    }
    service {
      name = "prometheus"
      port = "prometheus"
      tags = ["opt"]
      check {
        type     = "http"
        protocol = "http"
        path     = "/-/healthy"
        interval = "30s"
        timeout  = "2s"
      }
    }
    task "prometheus" {
      driver = "docker"
      config {
        logging {
          type = "journald"
        }
        image = "prom/prometheus:v2.24.1"
        ports = ["prometheus"]
        network_mode = "host"
        args = [
          "--config.file=/srv/prometheus/prometheus.yml",
          "--storage.tsdb.path=/prometheus",
          "--web.enable-lifecycle"
        ]
        mounts = [
          {
            type = "bind"
            source = "local/prometheus.yml"
            target = "/srv/prometheus/prometheus.yml"
          },
          {
            type = "bind"
            source = "local/cert"
            target = "/opt/cert"
          }
        ]
      }
      env {
      }
      template {
        change_mode = "restart"
        destination = "local/prometheus.yml"
        data = "{{ key \"prometheus/config/prometheus.yml\" }}"
      }
      template {
        change_mode = "noop"
        destination = "local/cert/consul-agent-ca.pem"
        data = "{{ key \"cert/consul/consul-agent-ca.pem\" }}"
      }
      template {
        change_mode = "noop"
        destination = "local/cert/dc1-cli-consul-0.pem"
        data = "{{ key \"cert/consul/dc1-cli-consul-0.pem\" }}"
      }
      template {
        change_mode = "noop"
        destination = "local/cert/dc1-cli-consul-0-key.pem"
        data = "{{ key \"cert/consul/dc1-cli-consul-0-key.pem\" }}"
      }
      template {
        change_mode = "noop"
        destination = "local/cert/nomad-ca.pem"
        data = "{{ key \"cert/nomad/nomad-ca.pem\" }}"
      }
      template {
        change_mode = "noop"
        destination = "local/cert/nomad-cli.pem"
        data = "{{ key \"cert/nomad/nomad-cli.pem\" }}"
      }
      template {
        change_mode = "noop"
        destination = "local/cert/nomad-cli-key.pem"
        data = "{{ key \"cert/nomad/nomad-cli-key.pem\" }}"
      }
      resources {
        cpu    = 500  # MHz
        memory = 1024 # MB
      }
    }
  }
}

提交Consul KV和Nomad作业文件。

那么,现在让我们开始把本地文件上传到Consul KV或者Nomad吧。

提供下面的文件。请调整域名和路径。

export CONSUL_HTTP_ADDR=https://consul.mydomain.tk:8501
export CONSUL_CACERT=./consul-agent-ca.pem
export CONSUL_CLIENT_CERT=./dc1-cli-consul-0.pem
export CONSUL_CLIENT_KEY=./dc1-cli-consul-0-key.pem

export NOMAD_ADDR=https://nomad.mydomain.tk:4646
export NOMAD_CACERT=./nomad-ca.pem
export NOMAD_CLIENT_CERT=./nomad-cli.pem
export NOMAD_CLIENT_KEY=./nomad-cli-key.pem
#!/bin/bash
source ./env.sh

consul kv put prometheus/config/prometheus.yml @./prometheus.yml
consul kv put cert/consul/consul-agent-ca.pem @./consul-agent-ca.pem
consul kv put cert/consul/dc1-cli-consul-0.pem @./dc1-cli-consul-0.pem
consul kv put cert/consul/dc1-cli-consul-0-key.pem @./dc1-cli-consul-0-key.pem
consul kv put cert/nomad/nomad-ca.pem @./nomad-ca.pem
consul kv put cert/nomad/nomad-cli.pem @./nomad-cli.pem
consul kv put cert/nomad/nomad-cli-key.pem @./nomad-cli-key.pem
#!/bin/bash
source ./env.sh

nomad job run prometheus.nomad

执行Shell脚本并输入。

$ chmod +x *.sh
$ ./register_setting.sh
Success! Data written to: prometheus/config/prometheus.yml
Success! Data written to: cert/consul/consul-agent-ca.pem
Success! Data written to: cert/consul/dc1-cli-consul-0.pem
Success! Data written to: cert/consul/dc1-cli-consul-0-key.pem
Success! Data written to: cert/nomad/nomad-ca.pem
Success! Data written to: cert/nomad/nomad-cli.pem
Success! Data written to: cert/nomad/nomad-cli-key.pem
$
$ ./register_job.sh
==> Monitoring evaluation "8e6e815c"
    Evaluation triggered by job "prometheus"
    Evaluation within deployment: "679353d4"
    Allocation "50ef43a1" created: node "ef1cba89", group "prometheus"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "8e6e815c" finished with status "complete"

在Consul或Nomad的管理UI界面上,确认Prometheus的作业已经启动。

你还可以使用Nomad CLI来检查Nomad作业的状态。

$ source ./env.sh
$ nomad job status prometheus
ID            = prometheus
Name          = prometheus
Submit Date   = 2021-03-31T02:26:49Z
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
prometheus  0       0         1        0       0         0

Latest Deployment
ID          = 679353d4
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
prometheus  1        1       1        0          2021-03-31T02:39:09Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
50ef43a1  ef1cba89  prometheus  0        run      running  2m29s ago  10s ago

状态已变为运行,并且健康度为1。

请用中文将以下内容进行释义:

总结

这次我尝试使用Nomad作业来启动Prometheus。
此外,我还确认了从Consul的服务发现中获取监视端点的设置。

在本次设置中,Prometheus收集的度量信息将保存在Nomad作业的启动选项“–storage.tsdb.path”指定的位置,也就是Docker容器内的/prometheus目录中。
这样做的话,当Nomad作业停止时,度量信息将会与Docker容器同时丢失,因此下次我们应该调整配置以确保适当地持久化保存。

以下是中文本地化的一个可能选项:

<参考>

    • https://learn.hashicorp.com/tutorials/nomad/prometheus-metrics

 

    • https://www.nomadproject.io/docs/configuration/telemetry

 

    https://prometheus.io/docs/prometheus/latest/configuration/configuration/
广告
将在 10 秒后关闭
bannerAds