使用 Prometheus Operator 监控 Kubernetes 集群外部
使用Prometheus Operator监控Kubernetes集群外部。
首先
我认为有一种情况是使用Kubernetes构建API等,并使用外部虚拟机作为数据库。我们可以通过使用Prometheus的Exporter来监控该虚拟机。
前提
-
- KubernetesにPrometheus Operatorを導入済みである
今回はGKEにHelmでPrometheus Operatorをインストールしている
監視対象のVMにデータベースを構築済みである
今回はVM(GCE)にCassandra Clusterをインストールしている
安装Cassandra Exporter
使用cassandra_exporter来监视Cassandra。
这是一个基于JMX exporter的项目的分支。
创建cassandra_exporter用户
首先,创建一个群组
$ sudo groupadd -r cassandra_exporter
创建用户后,确保无法使用系统用户登录。
$ sudo useradd -r -s /bin/false -g cassandra_exporter cassandra_exporter
用户创建后确认。
$ id cassandra_exporter
uid=997(cassandra_exporter) gid=994(cassandra_exporter) groups=994(cassandra_exporter)
下载cassandra_exporter的本体。
从这里下载
$ curl -L -O https://github.com/criteo/cassandra_exporter/releases/download/2.3.5/cassandra_exporter-2.3.5.jar
$ mv ./cassandra_exporter-2.3.5.jar /opt/cassandra_exporter.jar
$ chown cassandra_exporter:cassandra_exporter /opt/cassandra_exporter.jar
$ chmod 0755 /opt/cassandra_exporter.jar
创建cassandra_exporter(配置文件)。
参考此处
$ vi /etc/cassandra_exporter_config.yml
$ chown cassandra_exporter:cassandra_exporter /etc/cassandra_exporter_config.yml
host: localhost:7199
ssl: False
user:
password:
listenAddress: 0.0.0.0
listenPort: 8080
# Regular expression to match environment variable names that will be added
# as labels to all data points. The name of the label will be either
# $1 from the regex below, or the entire environment variable name if no match groups are defined
#
# Example:
# additionalLabelsFromEnvvars: "^ADDL\_(.*)$"
additionalLabelsFromEnvvars:
blacklist:
# To profile the duration of jmx call you can start the program with the following options
# > java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter.jar config.yml --oneshot
#
# To get intuition of what is done by cassandra when something is called you can look in cassandra
# https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/metrics
# Please avoid to scrape frequently those calls that are iterating over all sstables
# Unaccessible metrics (not enough privilege)
- java:lang:memorypool:.*usagethreshold.*
# Leaf attributes not interesting for us but that are presents in many path
- .*:999thpercentile
- .*:95thpercentile
- .*:fifteenminuterate
- .*:fiveminuterate
- .*:durationunit
- .*:rateunit
- .*:stddev
- .*:meanrate
- .*:mean
- .*:min
# Path present in many metrics but uninterresting
- .*:viewlockacquiretime:.*
- .*:viewreadtime:.*
- .*:cas[a-z]+latency:.*
- .*:colupdatetimedeltahistogram:.*
# Mostly for RPC, do not scrap them
- org:apache:cassandra:db:.*
# columnfamily is an alias for Table metrics
# https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
- org:apache:cassandra:metrics:columnfamily:.*
# Should we export metrics for system keyspaces/tables ?
- org:apache:cassandra:metrics:[^:]+:system[^:]*:.*
# Logback doesn't have any useful metrics
- ch:qos:logback:.*
# Don't scrap us
- com:criteo:nosql:cassandra:exporter:.*
maxScrapFrequencyInSec:
50:
- .*
# Refresh those metrics only every hour as it is costly for cassandra to retrieve them
3600:
- .*:snapshotssize:.*
- .*:estimated.*
- .*:totaldiskspaceused:.*
将其注册为systemd服务。
vi /opt/cassandra_exporter.sh
#!/bin/bash
java -jar /opt/cassandra_exporter.jar /etc/cassandra_exporter_config.yml
$ chown cassandra_exporter:cassandra_exporter /opt/cassandra_exporter.sh
$ chmod 0755 /opt/cassandra_exporter.sh
$ sudo vi /etc/systemd/system/cassandra_exporter.service
[Unit]
Description=Prometheus Cassandra Exporter
After=network.target
[Service]
Type=simple
User=cassandra_exporter
Group=cassandra_exporter
ExecStart=/opt/cassandra_exporter.sh
SyslogIdentifier=prometheus_cassandra_exporter
Restart=always
[Install]
WantedBy=multi-user.target
重新加载守护进程以使其被识别。
sudo systemctl daemon-reload
启动cassandra_exporter
sudo systemctl start cassandra_exporter.service
验证 cassandra_exporter
systemctl status cassandra_exporter
● cassandra_exporter.service - Prometheus Cassandra Exporter
Loaded: loaded (/etc/systemd/system/cassandra_exporter.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2020-04-26 16:41:46 JST; 10s ago
Main PID: 16849 (cassandra_expor)
CGroup: /system.slice/cassandra_exporter.service
├─16849 /bin/bash /opt/cassandra_exporter.sh
└─16850 java -jar /opt/cassandra_exporter.jar /etc/cassandra_exporter_config.yml
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.security.AccessController.doPrivileged(Native Method)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.security.AccessController.doPrivileged(Native Method)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.lang.Thread.run(Thread.java:748)
向 Prometheus 运算符添加 Service Monitor
如上所述,ServiceMonitor通过每个Service进行Scrape操作。
因此,我们需要创建Kubernetes的Endpoint和Service,以便从Kubernetes集群访问我们创建的Cassandra Exporter。
---
apiVersion: v1
kind: Endpoints
metadata:
name: cassandra-metrics-01
labels:
release: prometheus-operator
namespace: default
subsets:
- addresses:
- ip: Cassandra NodeのIPアドレス
ports:
- name: metrics
port: 8080
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: cassandra-metrics-01
namespace: default
labels:
release: prometheus-operator
k8s-app: cassandra-metrics
spec:
type: ExternalName
externalName: Cassandra NodeのIPアドレス
ports:
- name: metrics
port: 8080
protocol: TCP
targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cassandra-metrics-sm-01
labels:
release: prometheus-operator
prometheus: kube-prometheus
namespace: default
spec:
selector:
matchLabels:
release: prometheus-operator
k8s-app: cassandra-metrics-01
endpoints:
- port: metrics
interval: 10s
honorLabels: true
path: /metrics
$ kubectl apply -f cassandra-exporter-01.yaml
endpoints/cassandra-metrics-01 created
service/cassandra-metrics-01 created
servicemonitor.monitoring.coreos.com/cassandra-metrics-sm-01 created
确认
这次我们没有创建Grafana的Ingress,所以使用端口转发来进行确认。
kubectl port-forward svc/prometheus-operator-grafana 8080:80
导入第6400个仪表板
那么,指标将会以以下方式显示出来。