我查询了普罗米修斯
这篇文章是关于2020年KSCARROT圣诞日历的第24天的内容。
首先
因为我被要求参与Prometheus的安装工作,
所以我做了一些调查,并记录了一些笔记。
※我进行了很多搜索,所以真的非常感谢所有提供参考的人们。
普罗米修斯是什么?
简而言之,这是一款基于拉模式的资源监控工具。它能从Prometheus主体中获取各个被监控服务器的资源,并将其绘制成图形,还可以在超过预设的阈值时触发警报通知等功能。关于图形方面,仅使用Prometheus实际上并不太直观,所以我们会与一个名为Grafana的可视化工具进行协作,通过它来查看。 ※协作非常简单。
出口商
根据所需监测的内容而定,需要安装一个工具(成为访问目标的终端节点),来输出与监测内容相符的数据(指标)。 *需要在所有要监测的服务器上安装。
另外,如果各自超过了阈值,则必须进行警报通知,因此也需要安装alertmanager。
(https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz)
以下是可用于监视内容的出口者的代表示例。
-
- パフォーマンス監視
node_exporter
https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz
ログ監視
grok_exporter
https://github.com/fstab/grok_exporter/releases/download/v1.0.0.RC3/grok_exporter-1.0.0.RC3.linux-amd64.zip
死活監視
blackbox_exporter
https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-amd64.tar.gz
プロセス監視
process_exporter
https://github.com/ncabatoff/process-exporter/releases/download/v0.7.2/process-exporter_0.7.2_linux_amd64.rpm
大致概述
快速搭建性能监控环境
这是在CentOS8上尝试安装的步骤。
安装 AlertManager
请按照以下步骤进行安装。
wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
tar -xzvf alertmanager-0.21.0.linux-amd64.tar.gz
mv alertmanager-0.21.0.linux-amd64 /etc/alertmanager
cd /etc/alertmanager
alertmanager.ymlを以下の内容に書き換える
※Slackの alert-testチャンネルにアラート通知
------------------
global:
slack_api_url: 'https://hooks.slack.com/services/XXXXXXXX/XXXXXXXX/xxxxxxxxxxxxxxxxxxxxxxxx'
route:
group_wait: 30s
group_interval: 30s
repeat_interval: 30s
receiver: default
receivers:
- name: 'default'
slack_configs:
- send_resolved: true
channel: '#alert-test'
title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}"
------------------
# 実行(systemctlへの追加推奨)
/etc/alertmanager/alertmanager --config.file /etc/alertmanager/alertmanager.yml
在所需监控的服务器上进行性能监视安装。
通过以下步骤进行安装。
wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz
tar zxvf node_exporter-1.0.0.linux-amd64.tar.gz
mv node_exporter-1.0.0.linux-amd64 /etc/node_exporter
cd /etc/node_exporter
# 起動(systemctlへの追加推奨)
/etc/node_exporter/node_exporter
安装Prometheus
请按照以下步骤进行安装。
tar zxvf prometheus-2.18.1.linux-amd64.tar.gz
mv prometheus-2.18.1.linux-amd64 /etc/prometheus
cd /etc/prometheus
prometheus.ymlの内容は以下
------------------
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
rule_files:
- "./alert_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'node'
scrape_interval: 10s
static_configs:
- targets: ['client-host-name:9100']
------------------
alert_rules.ymlを以下内容で作成
------------------
groups:
- name: node
rules:
- alert: memory_used
expr: 100 * (1 - node_memory_MemFree_bytes{job='node'} / node_memory_MemTotal_bytes{job='node'}) > 90
for: 30s
labels:
severity: critical
annotations:
summary: "memory {{ $labels.instance }} used over 90%"
description: "cpu of {{ $labels.instance }} has been used over 90%"
- alert: disk_used
expr: 1 - node_filesystem_avail_bytes{job='node',mountpoint='/'} / node_filesystem_size_bytes{job='node',mountpoint='/'} > 90
for: 30s
labels:
severity: critical
annotations:
summary: "disk {{ $labels.instance }} used over 90%"
description: "disk of {{ $labels.instance }} has been used over 90%"
------------------
# api利用可として実行(systemctlへの追加推奨)
/etc/prometheus/prometheus --config.file=/etc/prometheus/prometheus.yml --web.enable-admin-api
只要服务器的内存/磁盘使用率超过90%,就会向Slack发送通知。