使用Prometheus+Grafana在1天内验证了AKS上NGINX和Redis的演示应用程序监控

3 年 ago

雅, 悟

5 minutes

我在下面URL的Hatenablog上写了同样的文章：

我正在验证并提出一种监控k8s的方法。

如果你急于准备示威活动，并且希望通过OSS进行监控，那么唯一的选择就是Prometheus。因此，我开始在我的环境中进行准备工作。

由于最终想要进行追踪，所以我们决定转向引入Istio，这将在另一篇文章中进行讨论。

请用中文将以下内容译述，只需给出一种选项。

To achieve success, you need to constantly challenge yourself and think outside the box. Additionally, always strive for improvement and never settle for mediocrity. Aim high and work hard to make your dreams come true. Success comes to those who are willing to put in the effort and embrace continuous growth.

演示环境

一开始我对于搭建k8s环境一无所知，不知道该怎么办，但考虑到AKS引擎在VM上的选择，我决定从AKS开始。

因为AKS能够创建这样的投票应用程序演示，所以我们决定使用Prometheus来监控它。

按照各种演示的步骤进行下去，就会生成这样的Pod。

$ kubectl get po
NAME READY STATUS RESTARTS AGE
azure-vote-back-679f7b955f-pwdpd 3/3 Running 0 2d1h
azure-vote-front-b47b4fbf8-4c8rk 3/3 Running 1 27h

azure-vote-front包含了nginx和Python的flask，azure-vote-back则包含了Redis，看起来是一个两层架构。

顺便说一下，在Windows上运行kubectl时，我最初使用的是git bash，但如果你使用的是Windows10，我建议你在WSL上使用Ubuntu控制台来运行它。
你可以使用一些在watch和git bash中无法使用的命令，而且尽管这是一个不同的文章，但安装istioctl也很简单。

通过Helm安装Prometheus

我打算尝试使用helm进行安装，因为它很方便，我想引入prometheus-operator这个charts。您可以在https://github.com/helm/charts/tree/master/stable/prometheus-operator 上找到它。

我会在参考这个网址的情况下尝试继续进行。

使用helm install即可完成安装本身。

$ helm install pg-op stable/prometheus-operator

确认当前是什么在里面。

$ kubectl get all
NAME                                                        READY   STATUS    RESTARTS   AGE
pod/alertmanager-pg-op-prometheus-operator-alertmanager-0   2/2     Running   0          10m
pod/azure-vote-back-5966fd4fd4-d87zv                        1/1     Running   0          94m
pod/azure-vote-front-67fc95647d-sgr42                       1/1     Running   0          94m
pod/pg-op-grafana-5b75f465d7-wq9rf                          2/2     Running   0          11m
pod/pg-op-kube-state-metrics-5fc85698d4-pjmzr               1/1     Running   0          11m
pod/pg-op-prometheus-node-exporter-2r6p8                    1/1     Running   0          11m
pod/pg-op-prometheus-node-exporter-cm2x8                    1/1     Running   0          11m
pod/pg-op-prometheus-node-exporter-fjfd2                    1/1     Running   0          11m
pod/pg-op-prometheus-operator-operator-7c7cb98579-xhlgt     2/2     Running   0          11m
pod/prometheus-pg-op-prometheus-operator-prometheus-0       3/3     Running   1          10m

NAME                                             TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                      AGE
service/alertmanager-operated                    ClusterIP      None           <none>         9093/TCP,9094/TCP,9094/UDP   10m
service/azure-vote-back                          ClusterIP      10.0.120.31    <none>         6379/TCP                     94m
service/azure-vote-front                         LoadBalancer   10.0.176.255   51.138.50.33   80:32737/TCP                 94m
service/kubernetes                               ClusterIP      10.0.0.1       <none>         443/TCP                      124m
service/pg-op-grafana                            ClusterIP      10.0.81.187    <none>         80/TCP                       11m
service/pg-op-kube-state-metrics                 ClusterIP      10.0.7.118     <none>         8080/TCP                     11m
service/pg-op-prometheus-node-exporter           ClusterIP      10.0.3.136     <none>         9100/TCP                     11m
service/pg-op-prometheus-operator-alertmanager   ClusterIP      10.0.7.11      <none>         9093/TCP                     11m
service/pg-op-prometheus-operator-operator       ClusterIP      10.0.67.126    <none>         8080/TCP,443/TCP             11m
service/pg-op-prometheus-operator-prometheus     ClusterIP      10.0.67.201    <none>         9090/TCP                     11m
service/prometheus-operated                      ClusterIP      None           <none>         9090/TCP                     10m

NAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/pg-op-prometheus-node-exporter   3         3         3       3            3           <none>          11m

NAME                                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/azure-vote-back                      1/1     1            1           94m
deployment.apps/azure-vote-front                     1/1     1            1           94m
deployment.apps/pg-op-grafana                        1/1     1            1           11m
deployment.apps/pg-op-kube-state-metrics             1/1     1            1           11m
deployment.apps/pg-op-prometheus-operator-operator   1/1     1            1           11m

NAME                                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/azure-vote-back-5966fd4fd4                      1         1         1       94m
replicaset.apps/azure-vote-front-67fc95647d                     1         1         1       94m
replicaset.apps/pg-op-grafana-5b75f465d7                        1         1         1       11m
replicaset.apps/pg-op-kube-state-metrics-5fc85698d4             1         1         1       11m
replicaset.apps/pg-op-prometheus-operator-operator-7c7cb98579   1         1         1       11m

NAME                                                                   READY   AGE
statefulset.apps/alertmanager-pg-op-prometheus-operator-alertmanager   1/1     10m
statefulset.apps/prometheus-pg-op-prometheus-operator-prometheus       1/1     10m

好像非常混乱。。。连命名空间也没默认分开，不过无视它继续进行。

顺便提一句，使用kubectl get all命令无法真正输出所有资源。
可以通过以下命令输出全部资源。

kubectl get "$(kubectl api-resources --namespaced=true --verbs=list -o name | tr "\n" "," | sed -e 's/,$//')"

确认Prometheus的targets

Prometheus采用了Pull式架构，需要从每个要监视的对象中通过exporter来进行拉取。可以通过目标(targets)来确认可以拉取什么。由于目前没有将Prometheus公开到外部，因此需要将本地端口进行转发以进行访问。

$ kubectl port-forward $(kubectl get pod -l app=prometheus -o template --template "{{(index .items 0).metadata.name}}") 9090:9090

我想要确认在这里是否有确切的监视项目。
我想要满足以下要求。

节点的资源指标

只需一种选择：
使用Node Exporter获取。如果正确安装了Prometheus Operator，它应该会自动启用。

容器资源度量

根据Qiita的文章，看起来是使用kubelet exporter从cadvisor中获取数据。初始时，kubelet exporter的导出功能似乎并不正常。通过以下步骤将https改为http，即可监视kubelet exporter。

kubelet exporter使用的端口从https更改为http
在Azure AKS中，默认情况下，使用https导出在kubelets上似乎无法正常工作。将将kubelets状态导出到prometheus的端口从https更改为http。
$ kubectl get servicemonitors pg-exporter-kubelets –namespace monitoring -o yaml | sed ‘s/https/http/’ | kubectl replace -f –
请参阅https://github.com/coreos/prometheus-operator/issues/926

nginx和redis的每个MW指标

当然地，由于默认设置下没有包含（そのため），需要单独引入exporter并进行scrape（设置Prometheus的Pull目标）。因此，接下来将会进行相关的编写。

Prometheus监控指标扩展

安装nginx exporter。

使用这个exporter来尝试一下，虽然有很多其他的选项可供选择。可以监控Nginx的连接数。其他的exporter可能使用mtail来监控访问日志，下次可以试试那个。

事先准备

在NGINX中，将stub_status页面暴露在端口8080上的/stub_status路径下，需要预先启用stub status，就像在GitHub上一样。stub status可以获取HTTP连接数。需要修改nginx.conf文件。

我认为根据不同的项目，使用Container或Kubernetes更改配置的方法可能各不相同，但本次我们选择使用ConfigMap将文件挂载到Volume上来修改配置。

准备nginx.conf文件

http {
（中略）
    server {
        location /stub_status {
            stub_status on;
        }
    }
}

首先，准备一个已添加了 stub_status 设置的 nginx.conf 文件。

将使用ConfigMap来挂载到nginx.conf的容器中。

$ kubectl create configmap nginx-config --from-file nginx.conf

创建一个配置映射。通过–from-file选项，文件名将直接作为键（key），值（value）将作为ngixn.conf文件的内容。

使用kubectl命令以YAML格式获取nginx-config配置映射的选项为：
kubectl以YAML格式获取名称为nginx-config的配置映射的命令为：$ kubectl get configmap nginx-confg -o yaml

data:
  nginx.conf: |-
（中略）
    http {
（中略）
        server {
            listen 8080;
            location /stub_status {
                stub_status on;
                allow 127.0.0.1;
                deny all;
            }
        }
（中略）

通过使用作った.spec.template.spec.volumes，在volume中添加configmap，并通过volumeMounts将其挂载到/etc/nginx/nginx.conf。

以中国本地语言进行重新表述：
用kubectl命令获取名为azure-vote-front的deploy的YAML文件。

（中略）
spec:
（中略）
  template:
（中略）
    spec:
      containers:
（中略）
        name: azure-vote-front
（中略）
        volumeMounts:
        - mountPath: /etc/nginx/nginx.conf
          name: nginx-config
          subPath: nginx.conf
（中略）
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: nginx.conf
            path: nginx.conf
          name: nginx-config
        name: nginx-config

确认以下配置文件nginx.conf是否正确应用。

$ kubectl exec -it azure-vote-front-67fc95647d-sgr42 //bin/cat //etc/nginx/conf/nginx.conf

如果confimap在默认情况下未能动态反映，就需要删除pod并重新部署。

如果已被反映，则可以通过以下方式确认设置了stub状态。（由于没有curl，所以使用wget。）

$ kubectl exec -it azure-vote-front-67fc95647d-sgr42 //bin/sh
wget http://127.0.0.1/stub_status -qO -

将nginx exporter作为侧车配置安装

把nginx exporter作为pod内的一个边车容器进行配置。
由于需要传递参数-nginx.scrape-uri http://:8080/stub_status，我将它指定为.spec.template.spec.containers[1].args。

用中文进行本地化改写： $ kubectl 获取 deploy azure-vote-front -o yaml

（中略）
spec:
（中略）
  template:
（中略）
    spec:
      containers:
（中略）
      - args:
        - -nginx.scrape-uri
        - http://127.0.0.1:8080/stub_status
        env:
        - name: name
          value: nginx-prom-exp
        image: nginx/nginx-prometheus-exporter:0.6.0
        imagePullPolicy: IfNotPresent
        name: nginx-prom-exp
        ports:
        - containerPort: 9113
          protocol: TCP
        resources:
          limits:
            cpu: 250m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File

这样，侧边车也已经成功部署好了。通过service打开端口来允许外部的pod访问exporter的端口9113。使用kubectl命令获取名为azure-vote-front的service的yaml配置文件。

（中略）
spec:
（中略）
  ports:
  - name: web
    nodePort: 32737
    port: 80
    protocol: TCP
    targetPort: 80
  - name: nginx-prom-exp
    nodePort: 31217
    port: 9113
    protocol: TCP
    targetPort: 9113

在此之前，exporter的配置已全部完成。
接下来，从prometheus的pod向azure-vote-front进行Pull，并确认是否可以获取到nginx的指标数据。

确认 azure-vote-front 的集群 IP。

$ kubectl get service
NAME                                     TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                       AGE
azure-vote-front                         LoadBalancer   10.0.176.255       80:32737/TCP,9113:31217/TCP   6d1h

检查已确认的集群IP地址，用wget命令来检查是否能够拉取到指标数据。

$ kubectl exec -it prometheus-pg-op-prometheus-operator-prometheus-0 -c prometheus //bin/sh
$ wget http://10.0.176.255:9113/metrics -qO -

使用Redis导出器

可以试着使用这个链接中的Redis exporter。

侧边车配置。

用中文将以下内容重新表达一遍，只需要一个选项：
“kubectl get deploy azure-vote-back -o yaml”

kubectl get deploy azure-vote-back -o yaml的原文为：

      - image: oliver006/redis_exporter:latest
        imagePullPolicy: Always
        name: redis-exporter
        ports:
        - containerPort: 9121
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File

发布端口。
kubectl获取服务azure-vote-back的yaml文件。

spec:
  clusterIP: 10.0.120.31
  ports:
  - name: redis
    port: 6379
    protocol: TCP
    targetPort: 6379
  - name: redis-exp
    port: 9121
    protocol: TCP
    targetPort: 9121

这样出口商就完成了。

确认是否可以从Prometheus中拉取相同的数据。

确认 azure-vote-front 的 cluster-ip。

$ kubectl get service
NAME                                     TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                       AGE
azure-vote-back                          ClusterIP      10.0.120.31    <none>          6379/TCP,9121/TCP             6d1h

确认集群IP，并使用wget命令检查是否能成功提取指标数据。

$ kubectl exec -it prometheus-pg-op-prometheus-operator-prometheus-0 -c prometheus //bin/sh
/prometheus $ wget http://10.0.120.31:9121/metrics -qO -

Prometheus的scrape配置设置

即使可以配置exporter，仍然有事情要做。因为Prometheus是拉模型，所以Prometheus本身需要知道exporter的地址。
有两种指定方法，静态和动态（服务发现），这个网站提供了很好的服务发现配置方法指南。

这次我们只是想尽快实施，所以我们选择了静态实施。
似乎是在Prometheus的scrape_configs中进行配置，但在prometheus-operator中，我不知道如何添加并遇到了困难。

最终通过参考以下网页，创建了一个名为”secret”的自定义资源，并将其作为additionalScrapeConfigs加载到prometheus中进行实施。
https://github.com/coreos/prometheus-operator/blob/master/Documentation/additional-scrape-config.md

制造一个秘密。

$ kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml

普罗米修斯-追加.yml

- job_name: custome/nginx-exporter/0
  static_configs:
    - targets:
      - 10.0.176.255:9113
- job_name: redis_exporter
  static_configs:
  - targets: 
      - '10.0.120.31:9121'

编辑custom resource Prometheus。将secret作为additionalScrapeConfigs进行引用。
kubectl get prometheus pg-op-prometheus-operator-prometheus -o yaml

（中略）
spec:
  additionalScrapeConfigs:
    key: prometheus-additional.yaml
    name: additional-scrape-configs

Grafana仪表盘定制

试用了之后感觉很好的仪表盘

https://grafana.com/grafana/dashboards/11074

Pod

https://grafana.com/grafana/dashboards/6336
https://grafana.com/grafana/dashboards/10518

Redis

https://grafana.com/grafana/dashboards/763

到目前为止，已经能够通过AKS上的azure-vote演示应用程序监控node、pod/container、nginx和redis的资源。

做不到的事情 (zuò bù de shì

REDメトリクス監視（アクセスログでレスポンスタイムとスループットは可視化したかったけどまだできてない。Istio入れてみたらいったんこれでいいじゃんとなった。）