普罗米修斯警报设置

3 年 ago

宇, 华

3 minutes

环境

Prometheus正在Docker容器上运行。
云环境：Azure
Docker主机：CentOS7.3
Docker容器：（Prometheus服务器）CentOS7.3

监视对象：
Docker宿主机：CentOS7.3
Docker容器：CentOS7.3（假设为Web服务器并启动Apache）

前提条件

・确保Prometheus服务器已成功安装。
在CentOS7.3和Docker上安装Prometheus。

AlertManager的安装程序

１．AlertMananager的URL复制

从Prometheus官方网站下载AlertManager。

在这个情况下，我们会选择以下内容：
操作系统：Linux
架构：amd64

请先搜索Alertmanager，然后将链接地址复制。

下载。

<Promethusサーバ>
## cd /usr/local/src
## wget https://github.com/prometheus/alertmanager/releases/download/v0.5.1/alertmanager-0.5.1.linux-amd64.tar.gz
## tar xfvz alertmanager-0.5.1.linux-amd64.tar.gz
## cd alertmanager-0.5.1.linux-amd64/
## cp -p alertmanager /usr/bin/.

３．配置设定文件

<Promethusサーバ>
## cd /etc/prometheus
## wget https://raw.githubusercontent.com/alerta/prometheus-config/master/alertmanager.yml
(Default状態)
## cat /etc/prometheus/alertmanager.yml
global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'localhost:25'                  
  smtp_from: 'alertmanager@example.org'           

route:
  receiver: "alerta"
  group_by: ['alertname']
  group_wait:      30s
  group_interval:  5m
  repeat_interval: 2h

receivers:
- name: "alerta"
  webhook_configs:
  - url: 'http://localhost:8080/webhooks/prometheus'
    send_resolved: true

增加AlertManager的自动启动设置。

<Promethusサーバ>
## vi /etc/default/alertmanager
OPTIONS="-config.file /etc/prometheus/alertmanager.yml"

## vi /usr/lib/systemd/system/alertmanager.service

[Unit]
Description=Prometheus alertmanager Service
After=syslog.target.prometheus.alertmanager.service

[Service]
Type=simple
EnvironmentFile=-/etc/default/alertmanager
ExecStart=/usr/bin/alertmanager $OPTIONS
PrivateTmp=true

[Install]
WantedBy=multi-user.target


## systemctl enable alertmanager.service
Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /usr/lib/systemd/system/alertmanager.service.
## systemctl start alertmanager

５. 提前准备警报设置（邮件设置）

を利用しており、SendGridのアカウント情報が必要となります。
SendGridのアカウント情報を入力することで、メール送信の機能を活用することができます。

６．设置警报

让我们编辑「3. 设置文件的配置」中的config文件。
这次我们将设置邮件警报。我们将改变值从默认值进行修改。

<Promethusサーバ>
## cat alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'smtp.sendgrid.net:25'    ★ SendGrid のSMTP接続先
  smtp_from: '****************@******'      ★ SendGrid 登録メールアドレス
  smtp_auth_username: '****@azure.com'      ★ SendGrid で払い出されたUserName
  smtp_auth_password: '*******'             ★ SendGrid で設定したパスワード（平文で記載するのはちょっとね）
  smtp_auth_secret: '*********'             ★ SendGrid で払い出されたAPIキー

route:
  receiver: "mail"
  group_by: ['alertname', 'instance', 'severity']   ★ 同一アラート名、同一インスタンス、同一サービスのアラートに対して
  group_wait: 30s                                   ★ 30秒以内のアラートは同一アラートと見なす
  group_interval: 10m                               ★ 10分毎に通知
  repeat_interval: 1h                               ★ 一度通知したアラートは 1時間後に通知

#  receiver: "slack-notifications"
#  group_by: ['alertname', 'instance']

receivers:
 - name: 'mail'
   email_configs:
   - to: *****@********,####@######        ★ アラート送信先のアドレス（複数あるときは、, カンマ区切り）
                                           ★ ㏄は、頑張ったけどできない。。。
                                           ★ toを分けたいときは、-to: を同じように記載すればOK

inhibit_rules:
 - source_match:
     severity: 'critical'                  ★ アラートの深刻度(severity) が critical の場合、
   target_match:                           ★ 同一のアラート名で warning のものは通知しない。
     severity: 'warning'
   equal: ['alertname']

7. 制定规则

请考虑并设定您所需的规则。

<Promethusサーバ>
## cat /etc/prometheus/alert.rules
ALERT instance_down
  IF up == 0
  FOR 2m
  LABELS { severity = "critical" }
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} down",
    description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes.",
  }

ALERT cpu_threshold_exceeded
  IF (100 * (1 - avg by(instance)(irate(node_cpu{job='node',mode='idle'}[5m])))) > THRESHOLD_CPU
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} CPU usage is dangerously high",
    description = "This device's cpu usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT mem_threshold_exceeded
  IF (node_memory_MemFree{job='node'} + node_memory_Cached{job='node'} + node_memory_Buffers{job='node'})/1000000 < THRESHOLD_MEM
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} memory usage is dangerously high",
    description = "This device's memory usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT filesystem_threshold_exceeded
  IF node_filesystem_avail{job='node',mountpoint='/'} / node_filesystem_size{job='node'} * 100 < THRESHOLD_FS
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} filesystem usage is dangerously high",
    description = "This device's filesystem usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT node_high_loadaverage
  IF rate(node_load1[1m]) > 2
  FOR 10s
  LABELS { severity = "warning" }
  ANNOTATIONS {
    summary = "High load average on {{$labels.instance}}",
    description = "{{$labels.instance}} has a high load average above 10s (current value: {{$value}})"
  }

８．将Prometheus集成

让Prometheus嵌入Alertmanager。
将其添加到/etc/prometheus/prometheus.yml的末尾。

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets: ['<ホスト名>.japaneast.cloudapp.azure.com:9093']

９．最后一点

我们应该确保设置文件的内容是正确的，需要仔细确认。

<Promethusサーバ>
## promtool check-config /etc/prometheus/prometheus.yml
## promtool check-config /etc/prometheus/alertmanager.yml

请重新启动 AlertManager 和 Prometheus，完成。

<Promethusサーバ>
## systemctl restart alertmanager
## systemctl restart prometheus

10. 请确认操作

让我们试着随意停止监视对象服务器。
你会收到一封邮件。

请参考以下网站

Tech-Sketch
构建Prometheus环境的步骤
Azure邮件发送使用SendGrid