中国的本土化改写：《Prometheus初体验 #2 – 从监控开始到Alertmanager-》

2 年 ago

清, 扬

2 minutes

继上一篇文章之后，我想继续写关于Prometheus的内容。
在上一篇文章中，我们已经介绍了从引入Prometheus到启动的过程。这一次我们将介绍实际的使用方法，以及如何引入node-exporter和alertmanager。

普罗米修斯的执行

使用式浏览器

现在，让我们亲身体验一下表达式浏览器。试着在搜索栏中输入“up”，然后点击执行按钮。接下来，你应该会看到结果显示为“up{instance=”localhost:9090″, job=”prometheus”}”。这表明只有一台处于up状态的Prometheus服务器存在。

度量
量度

同时，当访问http://localhost:9090/metrics时，可以获取到指标数据。这是因为Prometheus装备了自己的Prometheus指标测量设备。

节点出口

Node Exporter 是用于公开 Linux 和其他 Unix 系统的内核和机器级别指标的工具。它提供了诸如 CPU、内存、磁盘空间、磁盘 I/O、网络带宽等标准指标以及内核拥有的大量指标。您可以从这里下载 Node Exporter。与 Prometheus 下载时一样，解压缩即可，但这次不需要更改任何设置，您可以直接执行命令。

$ tar -xzf node_exporter-*.linux-amd64.tar.gz
$ cd node_exporter-*.linux-amd64/
$ ./node_exporter 
ts=2022-11-14T14:57:20.355Z caller=node_exporter.go:182 level=info msg="Starting node_exporter" version="(version=1.4.0, branch=HEAD, revision=7da1321761b3b8dfc9e496e1a60e6a476fec6018)"
ts=2022-11-14T14:57:20.355Z caller=node_exporter.go:183 level=info msg="Build context" build_context="(go=go1.19.1, user=root@83d90983e89c, date=20220926-12:32:56)"

接下来，你可以通过在Prometheus下的prometheus.yml文件中添加以下的抓取配置，来启动Node exporter的监控。

global:
    scrape_interval: 10s
scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
          - localhost:9090
    - job_name: node
      static_configs:
      - targets:
          - localhost:9100

警报管理器

最后，我打算显示警报。您可以像往常一样从这里下载Alertmanager并解压缩后执行。

$ tar -xzf alert_manager-*.linux-amd64.tar.gz
$ cd alert_manager-*.linux-amd64/
$ ./alert_manager
ts=2022-11-11T13:51:13.367Z caller=main.go:231 level=info msg="Starting Alertmanager" version="(version=0.24.0, branch=HEAD, revision=f484b17fa3c583ed1b2c8bbcec20ba1db2aa5f11)"
ts=2022-11-11T13:51:13.367Z caller=main.go:232 level=info build_context="(go=go1.17.8, user=root@265f14f5c6fc, date=20220325-09:31:33)"
ts=2022-11-11T13:51:13.386Z caller=cluster.go:185 level=info component=cluster msg="setting advertise address explicitly" addr=192.168.5.15 port=9094
ts=2022-11-11T13:51:13.389Z caller=cluster.go:680 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s

然而，目前还没有描述在什么情况下触发怎样的警报部分。需要将这些信息记录在名为prometheus.yml和新的rules.yml的文件中。

global:
    scrape_interval: 10s
    evaluation_interval: 10s
rule_files:
    - rules.yml
alerting:
    alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093
scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
          - localhost:9090
    - job_name: node
      static_configs:
      - targets:
          - localhost:9100

groups:
    - name: example
    rules:
    - alert: InstanceDown
      expr: up == 0
      for: 1m

总结

这次我尝试了从Prometheus的执行到使用Alertmanager的整个过程。据说，在Alertmanager中可以通过设置电子邮件来通过电子邮件通知警报，我很想在将来尝试使用它。

普罗米修斯的执行

使用式浏览器

度量 量度

节点出口

警报管理器

总结

度量
量度