使用 Prometheus Operator 监控 Kubernetes 集群外部

使用Prometheus Operator监控Kubernetes集群外部。

首先

我认为有一种情况是使用Kubernetes构建API等,并使用外部虚拟机作为数据库。我们可以通过使用Prometheus的Exporter来监控该虚拟机。

前提

    • KubernetesにPrometheus Operatorを導入済みである

今回はGKEにHelmでPrometheus Operatorをインストールしている

監視対象のVMにデータベースを構築済みである

今回はVM(GCE)にCassandra Clusterをインストールしている

安装Cassandra Exporter

使用cassandra_exporter来监视Cassandra。
这是一个基于JMX exporter的项目的分支。

创建cassandra_exporter用户

首先,创建一个群组

$ sudo groupadd -r cassandra_exporter

创建用户后,确保无法使用系统用户登录。

$ sudo useradd -r -s /bin/false -g cassandra_exporter cassandra_exporter

用户创建后确认。

$ id cassandra_exporter
uid=997(cassandra_exporter) gid=994(cassandra_exporter) groups=994(cassandra_exporter)

下载cassandra_exporter的本体。

从这里下载

$ curl -L -O https://github.com/criteo/cassandra_exporter/releases/download/2.3.5/cassandra_exporter-2.3.5.jar

$ mv ./cassandra_exporter-2.3.5.jar /opt/cassandra_exporter.jar

$ chown cassandra_exporter:cassandra_exporter /opt/cassandra_exporter.jar

$ chmod 0755 /opt/cassandra_exporter.jar

创建cassandra_exporter(配置文件)。

参考此处

$ vi /etc/cassandra_exporter_config.yml

$ chown cassandra_exporter:cassandra_exporter /etc/cassandra_exporter_config.yml
host: localhost:7199
ssl: False
user:
password:
listenAddress: 0.0.0.0
listenPort: 8080
# Regular expression to match environment variable names that will be added
# as labels to all data points. The name of the label will be either
# $1 from the regex below, or the entire environment variable name if no match groups are defined
#
# Example:
# additionalLabelsFromEnvvars: "^ADDL\_(.*)$"
additionalLabelsFromEnvvars:
blacklist:
   # To profile the duration of jmx call you can start the program with the following options
   # > java -Dorg.slf4j.simpleLogger.defaultLogLevel=trace -jar cassandra_exporter.jar config.yml --oneshot
   #
   # To get intuition of what is done by cassandra when something is called you can look in cassandra
   # https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/metrics
   # Please avoid to scrape frequently those calls that are iterating over all sstables

   # Unaccessible metrics (not enough privilege)
   - java:lang:memorypool:.*usagethreshold.*

   # Leaf attributes not interesting for us but that are presents in many path
   - .*:999thpercentile
   - .*:95thpercentile
   - .*:fifteenminuterate
   - .*:fiveminuterate
   - .*:durationunit
   - .*:rateunit
   - .*:stddev
   - .*:meanrate
   - .*:mean
   - .*:min

   # Path present in many metrics but uninterresting
   - .*:viewlockacquiretime:.*
   - .*:viewreadtime:.*
   - .*:cas[a-z]+latency:.*
   - .*:colupdatetimedeltahistogram:.*

   # Mostly for RPC, do not scrap them
   - org:apache:cassandra:db:.*

   # columnfamily is an alias for Table metrics
   # https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/metrics/TableMetrics.java#L162
   - org:apache:cassandra:metrics:columnfamily:.*

   # Should we export metrics for system keyspaces/tables ?
   - org:apache:cassandra:metrics:[^:]+:system[^:]*:.*

   # Logback doesn't have any useful metrics
   - ch:qos:logback:.*

   # Don't scrap us
   - com:criteo:nosql:cassandra:exporter:.*

maxScrapFrequencyInSec:
  50:
    - .*

  # Refresh those metrics only every hour as it is costly for cassandra to retrieve them
  3600:
    - .*:snapshotssize:.*
    - .*:estimated.*
    - .*:totaldiskspaceused:.*

将其注册为systemd服务。

vi /opt/cassandra_exporter.sh

#!/bin/bash
java -jar /opt/cassandra_exporter.jar /etc/cassandra_exporter_config.yml


$ chown cassandra_exporter:cassandra_exporter /opt/cassandra_exporter.sh

$ chmod 0755 /opt/cassandra_exporter.sh
$ sudo  vi /etc/systemd/system/cassandra_exporter.service

[Unit]
Description=Prometheus Cassandra Exporter
After=network.target

[Service]
Type=simple
User=cassandra_exporter
Group=cassandra_exporter
ExecStart=/opt/cassandra_exporter.sh

SyslogIdentifier=prometheus_cassandra_exporter
Restart=always

[Install]
WantedBy=multi-user.target

重新加载守护进程以使其被识别。

sudo systemctl daemon-reload

启动cassandra_exporter

sudo systemctl start cassandra_exporter.service

验证 cassandra_exporter

systemctl status cassandra_exporter
● cassandra_exporter.service - Prometheus Cassandra Exporter
   Loaded: loaded (/etc/systemd/system/cassandra_exporter.service; disabled; vendor preset: disabled)
   Active: active (running) since Sun 2020-04-26 16:41:46 JST; 10s ago
 Main PID: 16849 (cassandra_expor)
   CGroup: /system.slice/cassandra_exporter.service
           ├─16849 /bin/bash /opt/cassandra_exporter.sh
           └─16850 java -jar /opt/cassandra_exporter.jar /etc/cassandra_exporter_config.yml

Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.security.AccessController.doPrivileged(Native Method)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.security.AccessController.doPrivileged(Native Method)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Apr 26 16:41:55 blcloud-cassandra-01 prometheus_cassandra_exporter[16849]: at java.lang.Thread.run(Thread.java:748)

向 Prometheus 运算符添加 Service Monitor

image.png

如上所述,ServiceMonitor通过每个Service进行Scrape操作。
因此,我们需要创建Kubernetes的Endpoint和Service,以便从Kubernetes集群访问我们创建的Cassandra Exporter。

---
apiVersion: v1
kind: Endpoints
metadata:
    name: cassandra-metrics-01
    labels:
        release: prometheus-operator
    namespace: default
subsets:
    - addresses:
      - ip: Cassandra NodeのIPアドレス
      ports:
      - name: metrics
        port: 8080
        protocol: TCP

---
apiVersion: v1
kind: Service
metadata:
    name: cassandra-metrics-01
    namespace: default
    labels:
        release: prometheus-operator
        k8s-app: cassandra-metrics
spec:
    type: ExternalName
    externalName: Cassandra NodeのIPアドレス
    ports:
    - name: metrics
      port: 8080
      protocol: TCP
      targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
    name: cassandra-metrics-sm-01
    labels:
        release: prometheus-operator
        prometheus: kube-prometheus
    namespace: default
spec:
    selector:
        matchLabels:
            release: prometheus-operator
            k8s-app: cassandra-metrics-01
    endpoints:
    - port: metrics
      interval: 10s
      honorLabels: true
      path: /metrics
$ kubectl apply -f cassandra-exporter-01.yaml
endpoints/cassandra-metrics-01 created
service/cassandra-metrics-01 created
servicemonitor.monitoring.coreos.com/cassandra-metrics-sm-01 created

确认

这次我们没有创建Grafana的Ingress,所以使用端口转发来进行确认。

kubectl port-forward svc/prometheus-operator-grafana 8080:80

导入第6400个仪表板

スクリーンショット 2020-04-26 17.27.10.png
スクリーンショット 2020-04-26 18.02.44.png

那么,指标将会以以下方式显示出来。

スクリーンショット 2020-04-26 18.05.57.png
广告
将在 10 秒后关闭
bannerAds