查询OpenShift集群监控的资源使用情况
首先
请谅解,这只是个人的实验笔记,没有考虑其他人会阅读,所以不太容易理解。希望您能谅解。
环境认知
获取环境信息以进行记录。
每个节点的信息
版本是 OpenShift 4.8.26(Kubernetes 1.21)。
[root@bastion openshift]# oc version
Client Version: 4.8.26
Server Version: 4.8.26
Kubernetes Version: v1.21.6+bb8d50a
[root@bastion openshift]# oc get nodes
NAME STATUS ROLES AGE VERSION
ocp48-6vldl-infra-94vjm Ready infra,worker 12d v1.21.6+bb8d50a
ocp48-6vldl-infra-gjgwb Ready infra,worker 12d v1.21.6+bb8d50a
ocp48-6vldl-infra-ocs-dbjbd Ready infra,worker 5d12h v1.21.6+bb8d50a
ocp48-6vldl-infra-ocs-rdt8b Ready infra,worker 5d12h v1.21.6+bb8d50a
ocp48-6vldl-infra-ocs-xdhvn Ready infra,worker 5d12h v1.21.6+bb8d50a
ocp48-6vldl-infra-qvwvk Ready infra,worker 12d v1.21.6+bb8d50a
ocp48-6vldl-master-0 Ready master 17d v1.21.6+bb8d50a
ocp48-6vldl-master-1 Ready master 17d v1.21.6+bb8d50a
ocp48-6vldl-master-2 Ready master 17d v1.21.6+bb8d50a
ocp48-6vldl-worker-85crs Ready worker 17d v1.21.6+bb8d50a
ocp48-6vldl-worker-hdj9r Ready worker 17d v1.21.6+bb8d50a
ocp48-6vldl-worker-xp4bf Ready worker 17d v1.21.6+bb8d50a
[root@bastion openshift]#
-
- Master Node x 3
-
- Worker Node x 3
- Infrastructure Node x 6
群集监控的版本
集群监控作为 Cluster Operator 之一,会默认安装在 OpenShift 上。
[root@bastion openshift]# oc get co | grep monitoring
monitoring 4.8.26 True False False 16d
[root@bastion openshift]#
监控版本为 4.8.26。
部署了用于集群监控的 Pod。
属于OpenShift监控的Pod。通过添加nodeSelector将其部署在基础设施节点上。
[root@bastion openshift]# oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 5/5 Running 0 41m 10.128.4.21 ocp48-6vldl-infra-gjgwb <none> <none>
alertmanager-main-1 5/5 Running 0 42m 10.130.2.17 ocp48-6vldl-infra-qvwvk <none> <none>
alertmanager-main-2 5/5 Running 0 42m 10.131.2.14 ocp48-6vldl-infra-94vjm <none> <none>
cluster-monitoring-operator-95674b95b-slbjr 2/2 Running 4 16d 10.129.0.7 ocp48-6vldl-master-2 <none> <none>
grafana-5666d69fc9-d8plz 2/2 Running 0 42m 10.130.2.15 ocp48-6vldl-infra-qvwvk <none> <none>
kube-state-metrics-5f5f79ccbc-858xx 3/3 Running 0 42m 10.130.2.13 ocp48-6vldl-infra-qvwvk <none> <none>
node-exporter-4bcsq 2/2 Running 0 11d 172.18.0.43 ocp48-6vldl-infra-94vjm <none> <none>
node-exporter-68q5d 2/2 Running 0 4d13h 172.18.0.154 ocp48-6vldl-infra-ocs-xdhvn <none> <none>
node-exporter-6nqpj 2/2 Running 0 4d13h 172.18.0.113 ocp48-6vldl-infra-ocs-rdt8b <none> <none>
node-exporter-9fjj4 2/2 Running 0 4d13h 172.18.0.186 ocp48-6vldl-infra-ocs-dbjbd <none> <none>
node-exporter-btbb7 2/2 Running 0 16d 172.18.0.180 ocp48-6vldl-worker-85crs <none> <none>
node-exporter-cr6m2 2/2 Running 0 11d 172.18.0.67 ocp48-6vldl-infra-qvwvk <none> <none>
node-exporter-cwglh 2/2 Running 0 16d 172.18.0.32 ocp48-6vldl-master-2 <none> <none>
node-exporter-m6hn9 2/2 Running 0 16d 172.18.0.25 ocp48-6vldl-master-1 <none> <none>
node-exporter-mplgz 2/2 Running 0 11d 172.18.0.37 ocp48-6vldl-infra-gjgwb <none> <none>
node-exporter-qtxdl 2/2 Running 0 16d 172.18.0.124 ocp48-6vldl-master-0 <none> <none>
node-exporter-vrshn 2/2 Running 0 16d 172.18.0.126 ocp48-6vldl-worker-hdj9r <none> <none>
node-exporter-zgsmz 2/2 Running 0 16d 172.18.0.53 ocp48-6vldl-worker-xp4bf <none> <none>
openshift-state-metrics-5bbdb5896-nnx65 3/3 Running 0 42m 10.130.2.12 ocp48-6vldl-infra-qvwvk <none> <none>
prometheus-adapter-7b757d8db7-gm8v2 1/1 Running 0 42m 10.130.2.14 ocp48-6vldl-infra-qvwvk <none> <none>
prometheus-adapter-7b757d8db7-h5lt5 1/1 Running 0 42m 10.131.2.12 ocp48-6vldl-infra-94vjm <none> <none>
prometheus-k8s-0 7/7 Running 1 46s 10.131.2.15 ocp48-6vldl-infra-94vjm <none> <none>
prometheus-k8s-1 7/7 Running 1 46s 10.130.2.22 ocp48-6vldl-infra-qvwvk <none> <none>
prometheus-operator-7c8f55cc45-qjx6s 2/2 Running 0 43m 10.131.2.11 ocp48-6vldl-infra-94vjm <none> <none>
telemeter-client-844fdfd96-xzfm5 3/3 Running 0 42m 10.128.4.17 ocp48-6vldl-infra-gjgwb <none> <none>
thanos-querier-68d474b7df-bzqdv 5/5 Running 0 42m 10.130.2.16 ocp48-6vldl-infra-qvwvk <none> <none>
thanos-querier-68d474b7df-q5lmw 5/5 Running 0 42m 10.131.2.13 ocp48-6vldl-infra-94vjm <none> <none>
[root@bastion openshift]#
为集群监控而创建的配置映射。
安装 OpenShift 后,即使保持默认设置也可以运行,但需要创建用于监视和数据存储的永久卷(PV),以及在基础设施节点上创建 ConfigMap 并进行配置才能部署 Pod。
这个 YAML 文件是配置了当在 VMware 环境中通过 IPI 安装时,默认使用 thin 存储类的。
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |+
alertmanagerMain:
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
prometheusK8s:
volumeClaimTemplate: # 卷模板
spec: # 添加
storageClassName: thin # VMware的in-tree
volumeMode: Filesystem # 文件系统
resources: # 追加
requests: # 追加
storage: 40Gi # 先用40Gi的大小
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
prometheusOperator:
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
grafana:
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
k8sPrometheusAdapter:
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
kubeStateMetrics:
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
telemeterClient:
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
openshiftStateMetrics:
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
thanosQuerier:
nodeSelector: # 选择infra节点
node-role.kubernetes.io/infra: “”
tolerations: # 添加容忍性
– key: infra
value: reserved
effect: NoSchedule
– key: infra
value: reserved
effect: NoExecute
对主要Pod的请求/限制进行调查。
使用oc (kubectl) get pod命令来检查存在于openshift-monitoring命名空间中的Pod的请求(Requests)和限制(Limits)。
用于确认结果的命令
因为使用grep查找容器名称中的Requests和Limits很麻烦,所以开发了一种使用jsonpath获取这些信息的方法。
.spec.containers[*].name 和 .spec.containers[*].resources 在中国第一手资料的解释中可以这样翻译:
对应的两个。
.spec.initContainers[*].name是指.spec.initContainers[*].resources的名称。
我已经获得这对。
[root@bastion openshift]# oc get pod alertmanager-main-0 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
alertmanager {"requests":{"cpu":"4m","memory":"40Mi"}}
config-reloader {"requests":{"cpu":"1m","memory":"10Mi"}}
alertmanager-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
prom-label-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod cluster-monitoring-operator-95674b95b-slbjr -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
cluster-monitoring-operator {"requests":{"cpu":"10m","memory":"75Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod grafana-5666d69fc9-d8plz -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
grafana {"requests":{"cpu":"4m","memory":"64Mi"}}
grafana-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod kube-state-metrics-5f5f79ccbc-858xx -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
kube-state-metrics {"requests":{"cpu":"2m","memory":"80Mi"}}
kube-rbac-proxy-main {"requests":{"cpu":"1m","memory":"15Mi"}}
kube-rbac-proxy-self {"requests":{"cpu":"1m","memory":"15Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod node-exporter-4bcsq -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
node-exporter {"requests":{"cpu":"8m","memory":"32Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
init-textfile {"requests":{"cpu":"1m","memory":"1Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod openshift-state-metrics-5bbdb5896-nnx65 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
kube-rbac-proxy-main {"requests":{"cpu":"1m","memory":"20Mi"}}
kube-rbac-proxy-self {"requests":{"cpu":"1m","memory":"20Mi"}}
openshift-state-metrics {"requests":{"cpu":"1m","memory":"32Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod prometheus-adapter-7b757d8db7-gm8v2 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
prometheus-adapter {"requests":{"cpu":"1m","memory":"40Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod prometheus-k8s-0 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
prometheus {"requests":{"cpu":"70m","memory":"1Gi"}}
config-reloader {"requests":{"cpu":"1m","memory":"10Mi"}}
thanos-sidecar {"requests":{"cpu":"1m","memory":"25Mi"}}
prometheus-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
prom-label-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
kube-rbac-proxy-thanos {"requests":{"cpu":"1m","memory":"10Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod prometheus-operator-7c8f55cc45-qjx6s -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
prometheus-operator {"requests":{"cpu":"5m","memory":"150Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod telemeter-client-844fdfd96-xzfm5 -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
telemeter-client {"requests":{"cpu":"1m","memory":"40Mi"}}
reload {"requests":{"cpu":"1m","memory":"10Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
[root@bastion openshift]#
[root@bastion openshift]# oc get pod thanos-querier-68d474b7df-bzqdv -o=jsonpath='{range .spec.containers[*]}{.name}{" "}{.resources}{"\n"}{end}{range .spec.initContainers[*]}{.name} {" "}{.resources}{"\n"}{end}'
thanos-query {"requests":{"cpu":"10m","memory":"12Mi"}}
oauth-proxy {"requests":{"cpu":"1m","memory":"20Mi"}}
kube-rbac-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
prom-label-proxy {"requests":{"cpu":"1m","memory":"15Mi"}}
kube-rbac-proxy-rules {"requests":{"cpu":"1m","memory":"15Mi"}}
[root@bastion openshift]#
命令结果的总结
表示空白部分没有特别的指定。
alertmanager
4m
40Mi
config-reloader
1m
10Mi
alertmanager-proxy
1m
20Mi
kube-rbac-proxy
1m
15Mi
prom-label-proxy
1m
20Micluster-monitoring-operator-xxxx
kube-rbac-proxy
1m
20Mi
cluster-monitoring-operator
10m
75Migrafana-xxxx
grafana
4m
64Mi
grafana-proxy
1m
20Mikube-state-metrics-xxxx
kube-state-metrics
2m
80Mi
kube-rbac-proxy-main
1m
15Mi
kube-rbac-proxy-self
1m
15Minode-exporter-xxxx
node-exporter
8m
32Mi
kube-rbac-proxy
1m
15Mi
init-textfile(init Container)
2m
1Miopenshift-state-metrics-xxxx
kube-rbac-proxy-main
1m
20Mi
kube-rbac-proxy-self
1m
20Mi
openshift-state-metrics
1m
32Miprometheus-adapter-xxxx
1m
40Miprometheus-k8s-n
prometheus
70m
1Gi
config-reloader
1m
10Mi
thanos-sidecar
1m
25Mi
prometheus-proxy
1m
20Mi
kube-rbac-proxy
1m
15Mi
prom-label-proxy
1m
15Mi
kube-rbac-proxy-thanos
1m
10Miprometheus-operator-xxxx
prometheus-operator
5m
150Mi
kube-rbac-proxy
1m
15Mitelemeter-client-xxxx
telemeter-client
1m
40Mi
reload
1m
10Mi
kube-rbac-proxy
1m
20Mithanos-querier-xxxx
thanos-query
10m
12Mi
oauth-proxy
1m
20Mi
kube-rbac-proxy
1m
15Mi
prom-label-proxy
1m
15Mi
kube-rbac-proxy-rules
1m
15Mi
实际资源使用量
kubectl top pods 的结果
[root@bastion openshift]# kubectl top pods -n openshift-monitoring --use-protocol-buffers
NAME CPU(cores) MEMORY(bytes)
alertmanager-main-0 2m 117Mi
alertmanager-main-1 3m 110Mi
alertmanager-main-2 2m 104Mi
cluster-monitoring-operator-95674b95b-slbjr 9m 116Mi
grafana-5666d69fc9-d8plz 3m 136Mi
kube-state-metrics-5f5f79ccbc-858xx 3m 117Mi
node-exporter-4bcsq 3m 46Mi
node-exporter-68q5d 5m 52Mi
node-exporter-6nqpj 6m 55Mi
node-exporter-9fjj4 3m 58Mi
node-exporter-btbb7 4m 39Mi
node-exporter-cr6m2 3m 48Mi
node-exporter-cwglh 5m 46Mi
node-exporter-m6hn9 5m 47Mi
node-exporter-mplgz 5m 48Mi
node-exporter-qtxdl 3m 39Mi
node-exporter-vrshn 4m 39Mi
node-exporter-zgsmz 2m 40Mi
openshift-state-metrics-5bbdb5896-nnx65 0m 57Mi
prometheus-adapter-7b757d8db7-gm8v2 5m 72Mi
prometheus-adapter-7b757d8db7-h5lt5 4m 74Mi
prometheus-k8s-0 1230m 2425Mi
prometheus-k8s-1 514m 2513Mi
prometheus-operator-7c8f55cc45-qjx6s 7m 140Mi
telemeter-client-844fdfd96-xzfm5 0m 73Mi
thanos-querier-68d474b7df-bzqdv 4m 121Mi
thanos-querier-68d474b7df-q5lmw 2m 123Mi
[root@bastion openshift]#
我可以看出alertmanager-main-n和prometheus-k80-n的内存使用量和prometheus-k80-n的CPU使用量远远超过了请求值。