在Power Systems Virtual Server上安装OpenShift 4.7(第四部分):infra节点配置
首先
在这篇文章中,我们将执行Power Systems Virtual Server(以下称为PowerVS)的OpenShift 4.7后续操作,对infra节点进行配置。infra节点只运行监控、日志记录、路由和镜像注册表。由于监控和日志记录推荐使用块存储,所以我们会向虚拟服务器实例添加卷,并通过本地存储运营商进行使用。配置infra节点后,许多Pod将从worker节点转移,worker节点的CPU和内存使用量将减少,如下图所示。
1. 堡垒节点架构调整
为了向OpenShift集群添加infra节点,我们需要更改bastion节点的配置。有关更改之前的状态,请参考本文。
1.1. 修改dnsmasq的配置
在启动infra节点时,将grub.cfg-01-(MAC地址)通过tftp进行传输。您可以在IBM Cloud控制台上确认infra节点的MAC地址。
vi /var/lib/tftpboot/boot/grub2/powerpc-ieee1275/grub.cfg-01-fa-1b-ef-af-c1-20
default=0
fallback=1
timeout=1
menuentry "infra-0 CoreOS (BIOS)" {
linux "rhcos-4.7.7-ppc64le-live-kernel-ppc64le" rd.neednet=1 ip=192.168.25.115::192.168.25.100:255.255.255.0:infra-0:env2:none nameserver=192.168.25.100 coreos.inst=yes coreos.inst
.install_dev=sda coreos.live.rootfs_url=http://192.168.25.100:8080/rhcos-4.7.7-ppc64le-live-rootfs.ppc64le.img coreos.inst.ignition_url=http://192.168.25.100:8080/worker.ign
initrd "rhcos-4.7.7-ppc64le-live-initramfs.ppc64le.img"
}
### vi終了
vi /var/lib/tftpboot/boot/grub2/powerpc-ieee1275/grub.cfg-01-fa-88-3c-5c-37-20
default=0
fallback=1
timeout=1
menuentry "infra-1 CoreOS (BIOS)" {
linux "rhcos-4.7.7-ppc64le-live-kernel-ppc64le" rd.neednet=1 ip=192.168.25.116::192.168.25.100:255.255.255.0:infra-1:env2:none nameserver=192.168.25.100 coreos.inst=yes coreos.inst
.install_dev=sda coreos.live.rootfs_url=http://192.168.25.100:8080/rhcos-4.7.7-ppc64le-live-rootfs.ppc64le.img coreos.inst.ignition_url=http://192.168.25.100:8080/worker.ign
initrd "rhcos-4.7.7-ppc64le-live-initramfs.ppc64le.img"
}
### vi終了
vi /var/lib/tftpboot/boot/grub2/powerpc-ieee1275/grub.cfg-01-fa-ba-6f-8a-9c-20
default=0
fallback=1
timeout=1
menuentry "infra-2 CoreOS (BIOS)" {
linux "rhcos-4.7.7-ppc64le-live-kernel-ppc64le" rd.neednet=1 ip=192.168.25.117::192.168.25.100:255.255.255.0:infra-2:env2:none nameserver=192.168.25.100 coreos.inst=yes coreos.inst
.install_dev=sda coreos.live.rootfs_url=http://192.168.25.100:8080/rhcos-4.7.7-ppc64le-live-rootfs.ppc64le.img coreos.inst.ignition_url=http://192.168.25.100:8080/worker.ign
initrd "rhcos-4.7.7-ppc64le-live-initramfs.ppc64le.img"
}
### vi終了
vi /etc/hosts(以下を追加)
192.168.25.115 infra-0
192.168.25.116 infra-1
192.168.25.117 infra-2
### vi終了
vi /etc/dnsmasq.conf(以下を追加)
dhcp-host=fa:1b:ef:af:c1:20,infra-0,192.168.25.115
dhcp-host=fa:88:3c:5c:37:20,infra-1,192.168.25.116
dhcp-host=fa:ba:6f:8a:9c:20,infra-2,192.168.25.117
### vi終了
systemctl restart dnsmasq
1.2. 更改haproxy配置
对于infra节点上的路由器,进行负载均衡设置。请注意,如果将worker节点注释掉,将无法在OpenShift控制台访问之前将路由器移动到infra节点上。
vi /etc/haproxy/haproxy.cfg(backendにinfraノードを追加しworkerはコメントアウト)
backend http-80
・・・
#server worker-0 worker-0.ocp.powervs:80 check
#server worker-1 worker-1.ocp.powervs:80 check
server infra-0 infra-0.ocp.powervs:80 check
server infra-1 infra-1.ocp.powervs:80 check
server infra-2 infra-2.ocp.powervs:80 check
backend https-443
・・・
#server worker-0 worker-0.ocp.powervs:443 check
#server worker-1 worker-1.ocp.powervs:443 check
server infra-0 infra-0.ocp.powervs:443 check
server infra-1 infra-1.ocp.powervs:443 check
server infra-2 infra-2.ocp.powervs:443 check
### vi終了
systemctl restart haproxy
2. 增加 infra 节点
2.1. 创建PowerVS实例
我們將添加下表中的PowerVS實例作為infra節點進行創建。
イメージ追加
ボリュームinfra-00.5(shared)16GB120GB192.168.25.115rhos-47infra-00
infra-01
infra-02
infra-03infra-10.5(shared)16GB120GB192.168.25.116rhos-47infra-10
infra-11
infra-12
infra-13infra-20.5(shared)16GB120GB192.168.25.117rhos-47infra-20
infra-21
infra-22
infra-23
另外,我們會為操作員創建12個儲存區,容量並沒有明確的依據。
■ 监测用
■ 用于记录
使用ibmcloud命令创建基础结构节点的实例和存储卷。由于将卷连接到基础结构节点实例后导致RHCOS安装失败,无法加入OpenShift集群,所以稍后再进行连接。
ibmcloud pi instance-create infra-0 --image rhcos-47 --memory 16 \
--network "ocp-net 192.168.25.115" --processors 0.5 --processor-type shared \
--key-name sshkey --key-name sshkey --sys-type s922 --storage-type tier3
ibmcloud pi instance-create infra-1 --image rhcos-47 --memory 16 \
--network "ocp-net 192.168.25.116" --processors 0.5 --processor-type shared \
--key-name sshkey --key-name sshkey --sys-type s922 --storage-type tier3
ibmcloud pi instance-create infra-2 --image rhcos-47 --memory 16 \
--network "ocp-net 192.168.25.117" --processors 0.5 --processor-type shared \
--key-name sshkey --key-name sshkey --sys-type s922 --storage-type tier3
ibmcloud pi volume-create infra-00 --type tier3 --size 20
ibmcloud pi volume-create infra-01 --type tier3 --size 20
ibmcloud pi volume-create infra-02 --type tier3 --size 20
ibmcloud pi volume-create infra-03 --type tier3 --size 40
ibmcloud pi volume-create infra-10 --type tier3 --size 20
ibmcloud pi volume-create infra-11 --type tier3 --size 20
ibmcloud pi volume-create infra-12 --type tier3 --size 20
ibmcloud pi volume-create infra-13 --type tier3 --size 40
ibmcloud pi volume-create infra-20 --type tier3 --size 20
ibmcloud pi volume-create infra-21 --type tier3 --size 20
ibmcloud pi volume-create infra-22 --type tier3 --size 20
ibmcloud pi volume-create infra-23 --type tier3 --size 40
2.2. 向 OpenShift 集群添加 infra 节点
在打开infra节点的控制台后,再启动infra节点并进入SMS菜单。设置IP地址并使用bootp启动,以便开始安装RHCOS或OpenShift到infra节点,并认可两次来处理处于待定状态的CSR。
# 1回目
export KUBECONFG=/root/ocp/install/bare-metal/auth/kubeconfig
oc get csr | grep "Pending"
### 標準出力↓
csr-297mx 4m31s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-szw59 2m54s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
csr-zw668 5m44s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Pending
oc get csr | grep "Pending" | awk '{print $1}' | xargs oc adm certificate approve
### 標準出力↓
certificatesigningrequest.certificates.k8s.io/csr-297mx approved
certificatesigningrequest.certificates.k8s.io/csr-szw59 approved
certificatesigningrequest.certificates.k8s.io/csr-zw668 approved
# 2回目
oc get csr | grep "Pending"
### 標準出力↓
csr-2bl4v 27s kubernetes.io/kubelet-serving system:node:infra-2 Pending
csr-7wg96 17s kubernetes.io/kubelet-serving system:node:infra-0 Pending
csr-fdq4w 21s kubernetes.io/kubelet-serving system:node:infra-1 Pending
oc get csr | grep "Pending" | awk '{print $1}' | xargs oc adm certificate approve
### 標準出力↓
certificatesigningrequest.certificates.k8s.io/csr-2bl4v approved
certificatesigningrequest.certificates.k8s.io/csr-7wg96 approved
certificatesigningrequest.certificates.k8s.io/csr-fdq4w approved
oc get nodes
### 標準出力↓
NAME STATUS ROLES AGE VERSION
infra-0 Ready worker 3m48s v1.20.0+c8905da
infra-1 Ready worker 3m52s v1.20.0+c8905da
infra-2 Ready worker 3m58s v1.20.0+c8905da
master-0 Ready master 16h v1.20.0+c8905da
master-1 Ready master 16h v1.20.0+c8905da
master-2 Ready master 16h v1.20.0+c8905da
worker-0 Ready worker 16h v1.20.0+c8905da
worker-1 Ready worker 16h v1.20.0+c8905da
为了连接存储卷到infra节点,需要停止。
ssh core@infra-0 sudo shutdown -h 1
ssh core@infra-1 sudo shutdown -h 1
ssh core@infra-2 sudo shutdown -h 1
停止一段时间后,您就可以连接存储卷了。
ibmcloud pi volume-attach infra-00 --instance infra-0
ibmcloud pi volume-attach infra-01 --instance infra-0
ibmcloud pi volume-attach infra-02 --instance infra-0
ibmcloud pi volume-attach infra-03 --instance infra-0
ibmcloud pi volume-attach infra-10 --instance infra-1
ibmcloud pi volume-attach infra-11 --instance infra-1
ibmcloud pi volume-attach infra-12 --instance infra-1
ibmcloud pi volume-attach infra-13 --instance infra-1
ibmcloud pi volume-attach infra-20 --instance infra-2
ibmcloud pi volume-attach infra-21 --instance infra-2
ibmcloud pi volume-attach infra-22 --instance infra-2
ibmcloud pi volume-attach infra-23 --instance infra-2
2.3. 构建infra节点
2.3.1. 运营商引入
从管理控制台的OperatorHub中安装运营商。
2.3.2. 创建本地存储卷
使用本地存储操作符,在Infra节点上创建本地卷。在Infra节点上运行lsblk命令来确认添加的设备,但由于多路径配置的原因,同一设备会被重复显示。目前已应用指定sda、sdb和sdc作为监控用的本地卷,并指定sdd作为日志记录用的本地卷的清单,但考虑到多路径情况,也许更好的做法是描述多个设备。
ssh core@infra-0 lsblk -d
### 標準出力↓
sda 8:0 0 20G 0 disk
sdb 8:16 0 20G 0 disk
sdc 8:32 0 20G 0 disk
sdd 8:48 0 40G 0 disk
sde 8:64 0 120G 0 disk
sdf 8:80 0 20G 0 disk
sdg 8:96 0 20G 0 disk
sdh 8:112 0 20G 0 disk
sdi 8:128 0 40G 0 disk
sdj 8:144 0 120G 0 disk
・・・
oc apply -f monitoring-lv.yaml
oc apply -f logging-lv.yaml
oc get pod -n openshift-local-storage
### 標準出力↓
local-storage-operator-5d4cbd7bd7-p8thz 1/1 Running 0 61m
logging-lv-local-diskmaker-phd45 1/1 Running 0 2m19s
logging-lv-local-diskmaker-rkbcw 1/1 Running 0 2m19s
logging-lv-local-diskmaker-ttzfp 1/1 Running 0 2m19s
logging-lv-local-provisioner-kwmw9 1/1 Running 0 2m19s
logging-lv-local-provisioner-l42zk 1/1 Running 0 2m19s
logging-lv-local-provisioner-wsspv 1/1 Running 0 2m19s
monitoring-lv-local-diskmaker-dnwd6 1/1 Running 0 3m22s
monitoring-lv-local-diskmaker-dtjc8 1/1 Running 0 3m22s
monitoring-lv-local-diskmaker-rmmml 1/1 Running 0 3m22s
monitoring-lv-local-provisioner-6bdcq 1/1 Running 0 3m22s
monitoring-lv-local-provisioner-mgv4t 1/1 Running 0 3m22s
monitoring-lv-local-provisioner-zj5sd 1/1 Running 0 3m22s
oc get pv
### 標準出力↓
local-pv-117d5df 20Gi RWO Delete Available monitoring-sc 3m28s
local-pv-19bf4152 20Gi RWO Delete Available monitoring-sc 3m27s
local-pv-33c689a0 20Gi RWO Delete Available logging-sc 2m42s
local-pv-5015089e 20Gi RWO Delete Available monitoring-sc 3m28s
local-pv-5bb057fb 20Gi RWO Delete Available monitoring-sc 3m33s
local-pv-6d79c839 20Gi RWO Delete Available monitoring-sc 3m33s
local-pv-93c59c55 40Gi RWO Delete Available logging-sc 2m41s
local-pv-a8cdd61f 40Gi RWO Delete Available logging-sc 2m41s
local-pv-ae2c170 40Gi RWO Delete Available monitoring-sc 3m27s
local-pv-bd9d15 20Gi RWO Delete Available monitoring-sc 3m28s
local-pv-bf728680 20Gi RWO Delete Available monitoring-sc 3m33s
local-pv-dffcf04b 20Gi RWO Delete Available monitoring-sc 3m27s
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
name: monitoring-lv
namespace: openshift-local-storage
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- infra-0
- infra-1
- infra-2
storageClassDevices:
- storageClassName: monitoring-sc
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/sda
- /dev/sdb
- /dev/sdc
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
name: logging-lv
namespace: openshift-local-storage
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- infra-0
- infra-1
- infra-2
storageClassDevices:
- storageClassName: logging-sc
volumeMode: Filesystem
fsType: xfs
devicePaths:
- /dev/sdd
2.3.3. 添加角色
在工人/基础设施节点上添加角色。如果未指定nodeSelector,则将Pod部署到worker节点上。等待kube-apiserver更新(PROGRESSING=True)。
oc label node worker-0 node-role.kubernetes.io/app=""
oc label node worker-1 node-role.kubernetes.io/app=""
oc label node infra-0 node-role.kubernetes.io/infra=""
oc label node infra-1 node-role.kubernetes.io/infra=""
oc label node infra-2 node-role.kubernetes.io/infra=""
# 以下はPodにnodeSelectorが自動設定されるため実施しない
# DaemonSet等でPodを作成する際にスケジュールされない状況になる
# oc patch scheduler cluster --type merge --patch '{"spec":{"defaultNodeSelector":"node-role.kubernetes.io/app="}}'
# oc get co kube-apiserver
### 標準出力↓
# NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
# kube-apiserver 4.7.9 True True False 5d22h
oc get nodes
### 標準出力↓
NAME STATUS ROLES AGE VERSION
infra-0 Ready infra,worker 79m v1.20.0+c8905da
infra-1 Ready infra,worker 79m v1.20.0+c8905da
infra-2 Ready infra,worker 79m v1.20.0+c8905da
master-0 Ready master 17h v1.20.0+c8905da
master-1 Ready master 17h v1.20.0+c8905da
master-2 Ready master 17h v1.20.0+c8905da
worker-0 Ready app,worker 17h v1.20.0+c8905da
worker-1 Ready app,worker 17h v1.20.0+c8905da
2.3.4. 警报管理器的设定
AlertManager可以将默认监控和用户定义的监控的警报通知到PagerDuty/Webhook/Email/Slack。在这里,我们设置为通知到Slack。
oc -n openshift-monitoring create secret generic alertmanager-main \
--from-file=alertmanager.yaml --dry-run=client -o=yaml \
| oc -n openshift-monitoring replace secret --filename=-
global:
resolve_timeout: 5m
inhibit_rules:
- equal:
- namespace
- alertname
source_match:
severity: critical
target_match_re:
severity: warning|info
- equal:
- namespace
- alertname
source_match:
severity: warning
target_match_re:
severity: info
receivers:
- name: Critical
slack_configs:
- channel: alerts-critical
api_url: >-
<Slack Incoming Webhooks URL#1>
text: |-
{{ range .Alerts }}
*Alert:* {{ .Labels.alertname }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.message }}
*Details:*
{{ range .Labels.SortedPairs }} ? *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
- name: Default
slack_configs:
- channel: alerts-default
api_url: >-
<Slack Incoming Webhooks URL#2>
text: |-
{{ range .Alerts }}
*Alert:* {{ .Labels.alertname }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.message }}
*Details:*
{{ range .Labels.SortedPairs }} ? *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
- name: Watchdog
slack_configs:
- channel: alerts-watchdog
api_url: >-
<Slack Incoming Webhooks URL#3>
text: >-
{{ range .Alerts }}
*Alert:* {{ .Labels.alertname }} - `{{ .Labels.severity }}`
*Description:* {{ .Annotations.message }}
*Details:*
{{ range .Labels.SortedPairs }} ? *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
route:
group_by:
- namespace
group_interval: 5m
group_wait: 30s
receiver: Default
repeat_interval: 12h
routes:
- receiver: Watchdog
match:
alertname: Watchdog
- receiver: Critical
match:
severity: critical
以下是关于Slack准备方面的参考。
(Translation: The following provides a reference for Slack preparations.)
2.3.5. 监控设置 (Monitoring settings)
我们将进行有关存储和Pod启动节点的配置。我们还启用了用户定义的监控。
oc apply -f cluster-monitoring-config.yaml
oc get pod -n openshift-monitoring -o wide
### 標準出力↓
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 5/5 Running 0 3h7m 10.130.2.16 infra-1 <none> <none>
alertmanager-main-1 5/5 Running 0 3h19m 10.131.2.8 infra-0 <none> <none>
alertmanager-main-2 5/5 Running 0 3h39m 10.129.2.4 infra-2 <none> <none>
cluster-monitoring-operator-7dfbcc944d-l8bkg 2/2 Running 0 3h27m 10.128.0.26 master-0 <none> <none>
grafana-7c7bfd45c-dvqz9 2/2 Running 0 3h19m 10.129.2.8 infra-2 <none> <none>
kube-state-metrics-57df856d9c-7b76t 3/3 Running 0 3h8m 10.129.2.10 infra-2 <none> <none>
node-exporter-7z29r 2/2 Running 0 4h20m 192.168.25.113 worker-0 <none> <none>
node-exporter-8zwct 2/2 Running 0 4h22m 192.168.25.115 infra-0 <none> <none>
node-exporter-9ntlv 2/2 Running 0 4h20m 192.168.25.117 infra-2 <none> <none>
node-exporter-c4xhv 2/2 Running 0 4h18m 192.168.25.111 master-1 <none> <none>
node-exporter-jt5wn 2/2 Running 0 4h20m 192.168.25.116 infra-1 <none> <none>
node-exporter-v4nl6 2/2 Running 0 4h21m 192.168.25.112 master-2 <none> <none>
node-exporter-xgnrt 2/2 Running 0 4h19m 192.168.25.110 master-0 <none> <none>
node-exporter-xj2vr 2/2 Running 0 4h19m 192.168.25.114 worker-1 <none> <none>
openshift-state-metrics-77764976d9-t77bz 3/3 Running 0 3h8m 10.131.2.10 infra-0 <none> <none>
prometheus-adapter-5c865574c6-jsrfd 1/1 Running 0 3h2m 10.131.2.20 infra-0 <none> <none>
prometheus-adapter-5c865574c6-rsqmv 1/1 Running 0 3h3m 10.131.2.18 infra-0 <none> <none>
prometheus-k8s-0 7/7 Running 1 3h39m 10.129.2.6 infra-2 <none> <none>
prometheus-k8s-1 7/7 Running 1 3h19m 10.131.2.5 infra-0 <none> <none>
prometheus-operator-5667f89469-rjnrm 2/2 Running 0 3h8m 10.131.2.11 infra-0 <none> <none>
telemeter-client-595967cd48-8njgj 3/3 Running 0 3h8m 10.129.2.11 infra-2 <none> <none>
thanos-querier-649d574d69-xvksw 5/5 Running 0 3h8m 10.131.2.17 infra-0 <none> <none>
thanos-querier-649d574d69-ztn9s 5/5 Running 0 3h19m 10.129.2.9 infra-2 <none> <none>
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
prometheusK8s:
volumeClaimTemplate:
spec:
storageClassName: monitoring-sc
resources:
requests:
storage: 20Gi
nodeSelector:
node-role.kubernetes.io/infra: ""
alertmanagerMain:
volumeClaimTemplate:
spec:
storageClassName: monitoring-sc
resources:
requests:
storage: 20Gi
nodeSelector:
node-role.kubernetes.io/infra: ""
prometheusOperator:
nodeSelector:
node-role.kubernetes.io/infra: ""
grafana:
nodeSelector:
node-role.kubernetes.io/infra: ""
k8sPrometheusAdapter:
nodeSelector:
node-role.kubernetes.io/infra: ""
kubeStateMetrics:
nodeSelector:
node-role.kubernetes.io/infra: ""
telemeterClient:
nodeSelector:
node-role.kubernetes.io/infra: ""
openshiftStateMetrics:
nodeSelector:
node-role.kubernetes.io/infra: ""
thanosQuerier:
nodeSelector:
node-role.kubernetes.io/infra: ""
2.3.6. 用户定义监控设置
进行关于存储和Pod启动节点的设置。
oc apply -f user-monitoring-config.yaml
oc get pod -n openshift-user-workload-monitoring -o wide
### 標準出力↓
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
prometheus-operator-7f6c7bd5dd-svb6k 2/2 Running 0 3h39m 10.130.0.17 master-1 <none> <none>
prometheus-user-workload-0 5/5 Running 1 3h8m 10.130.2.13 infra-1 <none> <none>
prometheus-user-workload-1 5/5 Running 1 3h41m 10.129.2.7 infra-2 <none> <none>
thanos-ruler-user-workload-0 3/3 Running 0 3h9m 10.130.2.14 infra-1 <none> <none>
thanos-ruler-user-workload-1 3/3 Running 0 3h20m 10.131.2.6 infra-0 <none> <none>
apiVersion: v1
kind: ConfigMap
metadata:
name: user-workload-monitoring-config
namespace: openshift-user-workload-monitoring
data:
config.yaml: |
prometheus:
volumeClaimTemplate:
spec:
storageClassName: monitoring-sc
resources:
requests:
storage: 20Gi
nodeSelector:
node-role.kubernetes.io/infra: ""
thanosRuler:
volumeClaimTemplate:
spec:
storageClassName: monitoring-sc
resources:
requests:
storage: 20Gi
nodeSelector:
node-role.kubernetes.io/infra: ""
2.3.7. 日志记录设置
关于日志记录,将在创建实例的同时进行设置。尽管支持较短时间的日志记录,但对内存的需求较高,为了验证目的,将限制为4GB。此外,由于Curator已不推荐使用,因此未进行设置。
如果需要长期保存日志,建议将数据移动到第三方存储系统中。OpenShift Logging Elasticsearch实例已经针对短期(约7天)存储进行了优化和测试。
Elasticsearch 是一种内存集中型应用程序。默认情况下,OpenShift 容器平台会安装三个 Elasticsearch 节点,这些节点具有内存需求和限制为 16 GB。在 OpenShift 容器平台的最初三个节点集中,可能没有足够的内存来在集群中运行 Elasticsearch。
在OpenShift Logging 5.0版本中,Elasticsearch Curator被标为不推荐使用,并将在OpenShift Logging 5.1版本中被删除。
# ロギング設定
oc apply -f clo-instance.yaml
oc get pod -n openshift-logging -o wide
### 標準出力↓
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-logging-operator-6bf898cbd4-9jscd 1/1 Running 0 3h39m 10.128.2.9 worker-1 <none> <none>
elasticsearch-cdm-885k66by-1-778dddc6dd-zqcwz 2/2 Running 0 152m 10.130.2.30 infra-1 <none> <none>
elasticsearch-cdm-885k66by-2-66f6dd979f-q698p 2/2 Running 0 150m 10.131.2.26 infra-0 <none> <none>
elasticsearch-cdm-885k66by-3-98df488c8-6mgsj 2/2 Running 0 148m 10.129.2.18 infra-2 <none> <none>
elasticsearch-im-app-1620483300-f76lj 0/1 Completed 0 113s 10.130.2.65 infra-1 <none> <none>
elasticsearch-im-audit-1620483300-qqsnl 0/1 Completed 0 113s 10.130.2.66 infra-1 <none> <none>
elasticsearch-im-infra-1620483300-ls5l7 0/1 Completed 0 113s 10.130.2.67 infra-1 <none> <none>
fluentd-6cfrv 1/1 Running 0 23h 10.129.2.27 infra-2 <none> <none>
fluentd-8xfhk 1/1 Running 0 23h 10.131.2.23 infra-0 <none> <none>
fluentd-9z8f4 1/1 Running 0 23h 10.131.0.30 worker-0 <none> <none>
fluentd-gx6jt 1/1 Running 0 23h 10.130.0.47 master-1 <none> <none>
fluentd-kqrpq 1/1 Running 0 23h 10.129.0.48 master-2 <none> <none>
fluentd-mp7tz 1/1 Running 0 23h 10.128.2.60 worker-1 <none> <none>
fluentd-vlwqq 1/1 Running 0 23h 10.130.2.38 infra-1 <none> <none>
fluentd-wlwsl 1/1 Running 0 23h 10.128.0.47 master-0 <none> <none>
kibana-7c584cb9db-4sxp7 2/2 Running 0 126m 10.128.2.45 infra-1 <none> <none>
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
name: instance
namespace: openshift-logging
spec:
managementState: Managed
logStore:
type: elasticsearch
retentionPolicy:
application:
maxAge: 1d
infra:
maxAge: 7d
audit:
maxAge: 7d
elasticsearch:
nodeCount: 3
storage:
storageClassName: logging-sc
size: 40G
resources:
limits:
cpu: 500m
memory: 4Gi
requests:
cpu: 200m
memory: 4Gi
redundancyPolicy: SingleRedundancy
nodeSelector:
node-role.kubernetes.io/infra: ''
visualization:
type: kibana
kibana:
replicas: 1
nodeSelector:
node-role.kubernetes.io/infra: ''
collection:
logs:
type: fluentd
fluentd: {}
您可以通过访问以下URL,进入Kibana,并从管理菜单中创建索引模式以查看日志。有3种类型的索引,即app/infra/audit,但要创建索引模式,需要有数据累积。在设置日志记录后,只有infra数据存在。
2.3.8. 移动路由器和镜像仓库
将路由器和镜像注册表移动到基础设施节点中。
oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector": {"matchLabels":{"node-role.kubernetes.io/infra":""}}}}}'
oc patch --namespace=openshift-ingress-operator --patch='{"spec": {"replicas": 3}}' --type=merge ingresscontroller/default
oc get pod -n openshift-ingress -o wide
# 標準出力↓
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
router-default-5b89fb5765-7wndh 1/1 Running 0 4h16m 192.168.25.117 infra-2 <none> <none>
router-default-5b89fb5765-smscm 1/1 Running 0 3h56m 192.168.25.115 infra-0 <none> <none>
router-default-5b89fb5765-whkkl 1/1 Running 0 3h44m 192.168.25.116 infra-1 <none> <none>
oc patch configs.imageregistry.operator.openshift.io cluster -n openshift-image-registry --type=merge --patch '{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
oc get pod -n openshift-image-registry -o wide
# 標準出力↓
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cluster-image-registry-operator-768846dbff-7tgjg 1/1 Running 1 4h15m 10.130.0.13 master-1 <none> <none>
image-registry-74df945c7b-6n5lg 1/1 Running 0 3h45m 10.129.2.12 infra-2 <none> <none>
node-ca-czs95 1/1 Running 0 4h57m 192.168.25.110 master-0 <none> <none>
node-ca-hcqgk 1/1 Running 0 4h59m 192.168.25.116 infra-1 <none> <none>
node-ca-j4rzj 1/1 Running 0 4h58m 192.168.25.112 master-2 <none> <none>
node-ca-jcds9 1/1 Running 0 4h59m 192.168.25.115 infra-0 <none> <none>
node-ca-k8sjp 1/1 Running 0 4h58m 192.168.25.114 worker-1 <none> <none>
node-ca-nsrh6 1/1 Running 0 4h58m 192.168.25.111 master-1 <none> <none>
node-ca-rbmw8 1/1 Running 0 4h57m 192.168.25.113 worker-0 <none> <none>
node-ca-rg5hf 1/1 Running 0 4h57m 192.168.25.117 infra-2 <none> <none>
3. 资源情况
3.1 节点列表
3.2. 资源请求
通过“oc describe nodes”命令确认,我们发现定常状态下的资源请求如下表所示。
requestsmemory
requestsmaster-01696m7934Mimaster-11695m7986Mimaster-21714m7930Miworker-0349m2712Miworker-1559m3594Miinfra-0819m8777Miinfra-11078m8137Miinfra-2843m8958Mi
3.3. 物理核心使用狀況
使用“lparstat”命令确认后,物理核心的常态使用情况如下表所示。”%idle”表示未使用的物理核心(physc)的比例。换句话说,表示使用了used列分配的核心。
我们对于maste/worker/infra节点都使用虚拟服务器实例进行创建,每个实例的最大使用CPU核心为0.5个,且无上限(共享)。由于设置了最小容量为0.25和最大容量为4.00,因此物理核心会根据负载来分配,范围为0.25至4.00。另外,还采用了SMT8技术。
(100-%idle)
/100*physcmaster-086.000.750.11master-185.640.750.11master-290.250.590.06worker-097.740.140.00worker-197.050.190.01infra-087.340.600.08infra-188.820.540.06infra-284.590.700.11
3.4. 内存使用情况
通过使用free命令确认了正常状态下的内存使用情况,如下表所示。单位为GB。
※ 由于主节点的内存使用率较高,因此在此之后停止了所有节点,并将主节点的内存更改为20GB,工作节点的内存更改为12GB。不仅限于主节点,有时也会出现Pod集中在特定节点的情况,因此最好为内存分配提供充足的空间。
– availablemaster-015.8813.151.8514.03master-115.8812.962.0613.83master-215.889.575.4910.40worker-015.884.9510.295.59worker-115.886.149.056.83infra-015.8811.364.4511.43infra-115.8811.004.2811.61infra-215.8812.713.2412.64
3.5. 持续卷需求的使用量
在经过一周后,我从OpenShift控制台中确认了持久卷请求的使用量,这是基于infra节点配置的。在这个时间点上,并没有容量不足的问题。Prometheus的保留期限默认为15天。我设置了Elasticsearch实例的保留期限为7天。如果在OpenShift上没有其他应用程序的话,只要有这么多容量的话,应该不会有不足的问题。
■ 用于监控
■ 记录用