在Power Systems Virtual Server上安装OpenShift 4.7(第四部分):infra节点配置

首先

在这篇文章中,我们将执行Power Systems Virtual Server(以下称为PowerVS)的OpenShift 4.7后续操作,对infra节点进行配置。infra节点只运行监控、日志记录、路由和镜像注册表。由于监控和日志记录推荐使用块存储,所以我们会向虚拟服务器实例添加卷,并通过本地存储运营商进行使用。配置infra节点后,许多Pod将从worker节点转移,worker节点的CPU和内存使用量将减少,如下图所示。

01_custom.PNG
infra.PNG

1. 堡垒节点架构调整

为了向OpenShift集群添加infra节点,我们需要更改bastion节点的配置。有关更改之前的状态,请参考本文。

1.1. 修改dnsmasq的配置

在启动infra节点时,将grub.cfg-01-(MAC地址)通过tftp进行传输。您可以在IBM Cloud控制台上确认infra节点的MAC地址。

vi /var/lib/tftpboot/boot/grub2/powerpc-ieee1275/grub.cfg-01-fa-1b-ef-af-c1-20
default=0
fallback=1
timeout=1
menuentry "infra-0 CoreOS (BIOS)" {
linux "rhcos-4.7.7-ppc64le-live-kernel-ppc64le" rd.neednet=1 ip=192.168.25.115::192.168.25.100:255.255.255.0:infra-0:env2:none nameserver=192.168.25.100 coreos.inst=yes coreos.inst
.install_dev=sda coreos.live.rootfs_url=http://192.168.25.100:8080/rhcos-4.7.7-ppc64le-live-rootfs.ppc64le.img coreos.inst.ignition_url=http://192.168.25.100:8080/worker.ign
initrd "rhcos-4.7.7-ppc64le-live-initramfs.ppc64le.img"
}
### vi終了

vi /var/lib/tftpboot/boot/grub2/powerpc-ieee1275/grub.cfg-01-fa-88-3c-5c-37-20
default=0
fallback=1
timeout=1
menuentry "infra-1 CoreOS (BIOS)" {
linux "rhcos-4.7.7-ppc64le-live-kernel-ppc64le" rd.neednet=1 ip=192.168.25.116::192.168.25.100:255.255.255.0:infra-1:env2:none nameserver=192.168.25.100 coreos.inst=yes coreos.inst
.install_dev=sda coreos.live.rootfs_url=http://192.168.25.100:8080/rhcos-4.7.7-ppc64le-live-rootfs.ppc64le.img coreos.inst.ignition_url=http://192.168.25.100:8080/worker.ign
initrd "rhcos-4.7.7-ppc64le-live-initramfs.ppc64le.img"
}
### vi終了

vi /var/lib/tftpboot/boot/grub2/powerpc-ieee1275/grub.cfg-01-fa-ba-6f-8a-9c-20
default=0
fallback=1
timeout=1
menuentry "infra-2 CoreOS (BIOS)" {
linux "rhcos-4.7.7-ppc64le-live-kernel-ppc64le" rd.neednet=1 ip=192.168.25.117::192.168.25.100:255.255.255.0:infra-2:env2:none nameserver=192.168.25.100 coreos.inst=yes coreos.inst
.install_dev=sda coreos.live.rootfs_url=http://192.168.25.100:8080/rhcos-4.7.7-ppc64le-live-rootfs.ppc64le.img coreos.inst.ignition_url=http://192.168.25.100:8080/worker.ign
initrd "rhcos-4.7.7-ppc64le-live-initramfs.ppc64le.img"
}
### vi終了

vi /etc/hosts(以下を追加)
192.168.25.115  infra-0
192.168.25.116  infra-1
192.168.25.117  infra-2
### vi終了

vi /etc/dnsmasq.conf(以下を追加)
dhcp-host=fa:1b:ef:af:c1:20,infra-0,192.168.25.115
dhcp-host=fa:88:3c:5c:37:20,infra-1,192.168.25.116
dhcp-host=fa:ba:6f:8a:9c:20,infra-2,192.168.25.117
### vi終了

systemctl restart dnsmasq

1.2. 更改haproxy配置

对于infra节点上的路由器,进行负载均衡设置。请注意,如果将worker节点注释掉,将无法在OpenShift控制台访问之前将路由器移动到infra节点上。

vi /etc/haproxy/haproxy.cfg(backendにinfraノードを追加しworkerはコメントアウト)
backend http-80
    ・・・
    #server  worker-0 worker-0.ocp.powervs:80 check
    #server  worker-1 worker-1.ocp.powervs:80 check
    server  infra-0 infra-0.ocp.powervs:80 check
    server  infra-1 infra-1.ocp.powervs:80 check
    server  infra-2 infra-2.ocp.powervs:80 check

backend https-443
    ・・・
    #server  worker-0 worker-0.ocp.powervs:443 check
    #server  worker-1 worker-1.ocp.powervs:443 check
    server  infra-0 infra-0.ocp.powervs:443 check
    server  infra-1 infra-1.ocp.powervs:443 check
    server  infra-2 infra-2.ocp.powervs:443 check
### vi終了

systemctl restart haproxy

2. 增加 infra 节点

2.1. 创建PowerVS实例

我們將添加下表中的PowerVS實例作為infra節點進行創建。

ノードvCPU仮想RAMストレージIPアドレスブート
イメージ追加
ボリュームinfra-00.5(shared)16GB120GB192.168.25.115rhos-47infra-00
infra-01
infra-02
infra-03infra-10.5(shared)16GB120GB192.168.25.116rhos-47infra-10
infra-11
infra-12
infra-13infra-20.5(shared)16GB120GB192.168.25.117rhos-47infra-20
infra-21
infra-22
infra-23

另外,我們會為操作員創建12個儲存區,容量並沒有明確的依據。

■ 监测用

用途Pod個数容量Prometheusprometheus-k8s-<番号>220GBAlertManageralertmanager-main-<番号>320GBPrometheus(ユーザー定義)prometheus-user-workload-<番号>220GBThanosRuler(ユーザー定義)thanos-ruler-user-workload-<番号>220GB

■ 用于记录

用途Pod個数容量Elasticsearchelasticsearch-cdm-<番号>340GB

使用ibmcloud命令创建基础结构节点的实例和存储卷。由于将卷连接到基础结构节点实例后导致RHCOS安装失败,无法加入OpenShift集群,所以稍后再进行连接。

ibmcloud pi instance-create infra-0 --image rhcos-47 --memory 16             \
 --network "ocp-net 192.168.25.115" --processors 0.5 --processor-type shared \
 --key-name sshkey --key-name sshkey --sys-type s922 --storage-type tier3

ibmcloud pi instance-create infra-1 --image rhcos-47 --memory 16             \
 --network "ocp-net 192.168.25.116" --processors 0.5 --processor-type shared \
 --key-name sshkey --key-name sshkey --sys-type s922 --storage-type tier3

ibmcloud pi instance-create infra-2 --image rhcos-47 --memory 16             \
 --network "ocp-net 192.168.25.117" --processors 0.5 --processor-type shared \
 --key-name sshkey --key-name sshkey --sys-type s922 --storage-type tier3

ibmcloud pi volume-create infra-00 --type tier3 --size 20
ibmcloud pi volume-create infra-01 --type tier3 --size 20
ibmcloud pi volume-create infra-02 --type tier3 --size 20
ibmcloud pi volume-create infra-03 --type tier3 --size 40
ibmcloud pi volume-create infra-10 --type tier3 --size 20
ibmcloud pi volume-create infra-11 --type tier3 --size 20
ibmcloud pi volume-create infra-12 --type tier3 --size 20
ibmcloud pi volume-create infra-13 --type tier3 --size 40
ibmcloud pi volume-create infra-20 --type tier3 --size 20
ibmcloud pi volume-create infra-21 --type tier3 --size 20
ibmcloud pi volume-create infra-22 --type tier3 --size 20
ibmcloud pi volume-create infra-23 --type tier3 --size 40

2.2. 向 OpenShift 集群添加 infra 节点

在打开infra节点的控制台后,再启动infra节点并进入SMS菜单。设置IP地址并使用bootp启动,以便开始安装RHCOS或OpenShift到infra节点,并认可两次来处理处于待定状态的CSR。

# 1回目
export KUBECONFG=/root/ocp/install/bare-metal/auth/kubeconfig
oc get csr | grep "Pending"
### 標準出力↓
csr-297mx   4m31s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-szw59   2m54s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-zw668   5m44s   kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending

oc get csr | grep "Pending" | awk '{print $1}' | xargs oc adm certificate approve
### 標準出力↓
certificatesigningrequest.certificates.k8s.io/csr-297mx approved
certificatesigningrequest.certificates.k8s.io/csr-szw59 approved
certificatesigningrequest.certificates.k8s.io/csr-zw668 approved

# 2回目
oc get csr | grep "Pending"
### 標準出力↓
csr-2bl4v   27s     kubernetes.io/kubelet-serving                 system:node:infra-2                                                         Pending
csr-7wg96   17s     kubernetes.io/kubelet-serving                 system:node:infra-0                                                         Pending
csr-fdq4w   21s     kubernetes.io/kubelet-serving                 system:node:infra-1                                                         Pending

oc get csr | grep "Pending" | awk '{print $1}' | xargs oc adm certificate approve
### 標準出力↓
certificatesigningrequest.certificates.k8s.io/csr-2bl4v approved
certificatesigningrequest.certificates.k8s.io/csr-7wg96 approved
certificatesigningrequest.certificates.k8s.io/csr-fdq4w approved

oc get nodes
### 標準出力↓
NAME       STATUS   ROLES    AGE     VERSION
infra-0    Ready    worker   3m48s   v1.20.0+c8905da
infra-1    Ready    worker   3m52s   v1.20.0+c8905da
infra-2    Ready    worker   3m58s   v1.20.0+c8905da
master-0   Ready    master   16h     v1.20.0+c8905da
master-1   Ready    master   16h     v1.20.0+c8905da
master-2   Ready    master   16h     v1.20.0+c8905da
worker-0   Ready    worker   16h     v1.20.0+c8905da
worker-1   Ready    worker   16h     v1.20.0+c8905da

为了连接存储卷到infra节点,需要停止。

ssh core@infra-0 sudo shutdown -h 1
ssh core@infra-1 sudo shutdown -h 1
ssh core@infra-2 sudo shutdown -h 1

停止一段时间后,您就可以连接存储卷了。

ibmcloud pi volume-attach infra-00 --instance infra-0
ibmcloud pi volume-attach infra-01 --instance infra-0
ibmcloud pi volume-attach infra-02 --instance infra-0
ibmcloud pi volume-attach infra-03 --instance infra-0

ibmcloud pi volume-attach infra-10 --instance infra-1
ibmcloud pi volume-attach infra-11 --instance infra-1
ibmcloud pi volume-attach infra-12 --instance infra-1
ibmcloud pi volume-attach infra-13 --instance infra-1

ibmcloud pi volume-attach infra-20 --instance infra-2
ibmcloud pi volume-attach infra-21 --instance infra-2
ibmcloud pi volume-attach infra-22 --instance infra-2
ibmcloud pi volume-attach infra-23 --instance infra-2

2.3. 构建infra节点

2.3.1. 运营商引入

从管理控制台的OperatorHub中安装运营商。

op03.PNG

2.3.2. 创建本地存储卷

使用本地存储操作符,在Infra节点上创建本地卷。在Infra节点上运行lsblk命令来确认添加的设备,但由于多路径配置的原因,同一设备会被重复显示。目前已应用指定sda、sdb和sdc作为监控用的本地卷,并指定sdd作为日志记录用的本地卷的清单,但考虑到多路径情况,也许更好的做法是描述多个设备。

ssh core@infra-0 lsblk -d
### 標準出力↓
sda    8:0    0   20G  0 disk
sdb    8:16   0   20G  0 disk
sdc    8:32   0   20G  0 disk
sdd    8:48   0   40G  0 disk
sde    8:64   0  120G  0 disk
sdf    8:80   0   20G  0 disk
sdg    8:96   0   20G  0 disk
sdh    8:112  0   20G  0 disk
sdi    8:128  0   40G  0 disk
sdj    8:144  0  120G  0 disk
・・・
oc apply -f monitoring-lv.yaml
oc apply -f logging-lv.yaml

oc get pod -n openshift-local-storage
### 標準出力↓
local-storage-operator-5d4cbd7bd7-p8thz   1/1     Running   0          61m
logging-lv-local-diskmaker-phd45          1/1     Running   0          2m19s
logging-lv-local-diskmaker-rkbcw          1/1     Running   0          2m19s
logging-lv-local-diskmaker-ttzfp          1/1     Running   0          2m19s
logging-lv-local-provisioner-kwmw9        1/1     Running   0          2m19s
logging-lv-local-provisioner-l42zk        1/1     Running   0          2m19s
logging-lv-local-provisioner-wsspv        1/1     Running   0          2m19s
monitoring-lv-local-diskmaker-dnwd6       1/1     Running   0          3m22s
monitoring-lv-local-diskmaker-dtjc8       1/1     Running   0          3m22s
monitoring-lv-local-diskmaker-rmmml       1/1     Running   0          3m22s
monitoring-lv-local-provisioner-6bdcq     1/1     Running   0          3m22s
monitoring-lv-local-provisioner-mgv4t     1/1     Running   0          3m22s
monitoring-lv-local-provisioner-zj5sd     1/1     Running   0          3m22s

oc get pv
### 標準出力↓
local-pv-117d5df    20Gi       RWO            Delete           Available           monitoring-sc            3m28s
local-pv-19bf4152   20Gi       RWO            Delete           Available           monitoring-sc            3m27s
local-pv-33c689a0   20Gi       RWO            Delete           Available           logging-sc               2m42s
local-pv-5015089e   20Gi       RWO            Delete           Available           monitoring-sc            3m28s
local-pv-5bb057fb   20Gi       RWO            Delete           Available           monitoring-sc            3m33s
local-pv-6d79c839   20Gi       RWO            Delete           Available           monitoring-sc            3m33s
local-pv-93c59c55   40Gi       RWO            Delete           Available           logging-sc               2m41s
local-pv-a8cdd61f   40Gi       RWO            Delete           Available           logging-sc               2m41s
local-pv-ae2c170    40Gi       RWO            Delete           Available           monitoring-sc            3m27s
local-pv-bd9d15     20Gi       RWO            Delete           Available           monitoring-sc            3m28s
local-pv-bf728680   20Gi       RWO            Delete           Available           monitoring-sc            3m33s
local-pv-dffcf04b   20Gi       RWO            Delete           Available           monitoring-sc            3m27s
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  name: monitoring-lv
  namespace: openshift-local-storage
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - infra-0
          - infra-1
          - infra-2
  storageClassDevices:
    - storageClassName: monitoring-sc
      volumeMode: Filesystem
      fsType: xfs
      devicePaths:
        - /dev/sda
        - /dev/sdb
        - /dev/sdc
apiVersion: local.storage.openshift.io/v1
kind: LocalVolume
metadata:
  name: logging-lv
  namespace: openshift-local-storage
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - infra-0
          - infra-1
          - infra-2
  storageClassDevices:
    - storageClassName: logging-sc
      volumeMode: Filesystem
      fsType: xfs
      devicePaths:
        - /dev/sdd

2.3.3. 添加角色

在工人/基础设施节点上添加角色。如果未指定nodeSelector,则将Pod部署到worker节点上。等待kube-apiserver更新(PROGRESSING=True)。

oc label node worker-0 node-role.kubernetes.io/app=""
oc label node worker-1 node-role.kubernetes.io/app=""
oc label node infra-0 node-role.kubernetes.io/infra=""
oc label node infra-1 node-role.kubernetes.io/infra=""
oc label node infra-2 node-role.kubernetes.io/infra=""

# 以下はPodにnodeSelectorが自動設定されるため実施しない
# DaemonSet等でPodを作成する際にスケジュールされない状況になる
# oc patch scheduler cluster --type merge --patch '{"spec":{"defaultNodeSelector":"node-role.kubernetes.io/app="}}'

# oc get co kube-apiserver
### 標準出力↓
# NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
# kube-apiserver   4.7.9     True        True          False      5d22h

oc get nodes
### 標準出力↓
NAME       STATUS   ROLES          AGE   VERSION
infra-0    Ready    infra,worker   79m   v1.20.0+c8905da
infra-1    Ready    infra,worker   79m   v1.20.0+c8905da
infra-2    Ready    infra,worker   79m   v1.20.0+c8905da
master-0   Ready    master         17h   v1.20.0+c8905da
master-1   Ready    master         17h   v1.20.0+c8905da
master-2   Ready    master         17h   v1.20.0+c8905da
worker-0   Ready    app,worker     17h   v1.20.0+c8905da
worker-1   Ready    app,worker     17h   v1.20.0+c8905da

2.3.4. 警报管理器的设定

AlertManager可以将默认监控和用户定义的监控的警报通知到PagerDuty/Webhook/Email/Slack。在这里,我们设置为通知到Slack。

oc -n openshift-monitoring create secret generic alertmanager-main \
  --from-file=alertmanager.yaml --dry-run=client -o=yaml           \
  | oc -n openshift-monitoring replace secret --filename=-
global:
  resolve_timeout: 5m
inhibit_rules:
  - equal:
      - namespace
      - alertname
    source_match:
      severity: critical
    target_match_re:
      severity: warning|info
  - equal:
      - namespace
      - alertname
    source_match:
      severity: warning
    target_match_re:
      severity: info
receivers:
  - name: Critical
    slack_configs:
      - channel: alerts-critical
        api_url: >-
          <Slack Incoming Webhooks URL#1>
        text: |-
          {{ range .Alerts }}
            *Alert:* {{ .Labels.alertname }} - `{{ .Labels.severity }}`
            *Description:* {{ .Annotations.message }}
            *Details:*
            {{ range .Labels.SortedPairs }} ? *{{ .Name }}:* `{{ .Value }}`
            {{ end }}
          {{ end }}
  - name: Default
    slack_configs:
      - channel: alerts-default
        api_url: >-
          <Slack Incoming Webhooks URL#2>
        text: |-
          {{ range .Alerts }}
            *Alert:* {{ .Labels.alertname }} - `{{ .Labels.severity }}`
            *Description:* {{ .Annotations.message }}
            *Details:*
            {{ range .Labels.SortedPairs }} ? *{{ .Name }}:* `{{ .Value }}`
            {{ end }}
          {{ end }}
  - name: Watchdog
    slack_configs:
      - channel: alerts-watchdog
        api_url: >-
          <Slack Incoming Webhooks URL#3>
        text: >-
          {{ range .Alerts }}
            *Alert:* {{ .Labels.alertname }} - `{{ .Labels.severity }}`
            *Description:* {{ .Annotations.message }}
            *Details:*
            {{ range .Labels.SortedPairs }} ? *{{ .Name }}:* `{{ .Value }}`
            {{ end }}
          {{ end }}
route:
  group_by:
    - namespace
  group_interval: 5m
  group_wait: 30s
  receiver: Default
  repeat_interval: 12h
  routes:
    - receiver: Watchdog
      match:
        alertname: Watchdog
    - receiver: Critical
      match:
        severity: critical
slack.PNG

以下是关于Slack准备方面的参考。

(Translation: The following provides a reference for Slack preparations.)

2.3.5. 监控设置 (Monitoring settings)

我们将进行有关存储和Pod启动节点的配置。我们还启用了用户定义的监控。

oc apply -f cluster-monitoring-config.yaml
oc get pod -n openshift-monitoring -o wide
### 標準出力↓
NAME                                           READY   STATUS    RESTARTS   AGE     IP               NODE       NOMINATED NODE   READINESS GATES
alertmanager-main-0                            5/5     Running   0          3h7m    10.130.2.16      infra-1    <none>           <none>
alertmanager-main-1                            5/5     Running   0          3h19m   10.131.2.8       infra-0    <none>           <none>
alertmanager-main-2                            5/5     Running   0          3h39m   10.129.2.4       infra-2    <none>           <none>
cluster-monitoring-operator-7dfbcc944d-l8bkg   2/2     Running   0          3h27m   10.128.0.26      master-0   <none>           <none>
grafana-7c7bfd45c-dvqz9                        2/2     Running   0          3h19m   10.129.2.8       infra-2    <none>           <none>
kube-state-metrics-57df856d9c-7b76t            3/3     Running   0          3h8m    10.129.2.10      infra-2    <none>           <none>
node-exporter-7z29r                            2/2     Running   0          4h20m   192.168.25.113   worker-0   <none>           <none>
node-exporter-8zwct                            2/2     Running   0          4h22m   192.168.25.115   infra-0    <none>           <none>
node-exporter-9ntlv                            2/2     Running   0          4h20m   192.168.25.117   infra-2    <none>           <none>
node-exporter-c4xhv                            2/2     Running   0          4h18m   192.168.25.111   master-1   <none>           <none>
node-exporter-jt5wn                            2/2     Running   0          4h20m   192.168.25.116   infra-1    <none>           <none>
node-exporter-v4nl6                            2/2     Running   0          4h21m   192.168.25.112   master-2   <none>           <none>
node-exporter-xgnrt                            2/2     Running   0          4h19m   192.168.25.110   master-0   <none>           <none>
node-exporter-xj2vr                            2/2     Running   0          4h19m   192.168.25.114   worker-1   <none>           <none>
openshift-state-metrics-77764976d9-t77bz       3/3     Running   0          3h8m    10.131.2.10      infra-0    <none>           <none>
prometheus-adapter-5c865574c6-jsrfd            1/1     Running   0          3h2m    10.131.2.20      infra-0    <none>           <none>
prometheus-adapter-5c865574c6-rsqmv            1/1     Running   0          3h3m    10.131.2.18      infra-0    <none>           <none>
prometheus-k8s-0                               7/7     Running   1          3h39m   10.129.2.6       infra-2    <none>           <none>
prometheus-k8s-1                               7/7     Running   1          3h19m   10.131.2.5       infra-0    <none>           <none>
prometheus-operator-5667f89469-rjnrm           2/2     Running   0          3h8m    10.131.2.11      infra-0    <none>           <none>
telemeter-client-595967cd48-8njgj              3/3     Running   0          3h8m    10.129.2.11      infra-2    <none>           <none>
thanos-querier-649d574d69-xvksw                5/5     Running   0          3h8m    10.131.2.17      infra-0    <none>           <none>
thanos-querier-649d574d69-ztn9s                5/5     Running   0          3h19m   10.129.2.9       infra-2    <none>           <none>
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusK8s:
      volumeClaimTemplate:
        spec:
          storageClassName: monitoring-sc
          resources:
            requests:
              storage: 20Gi
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    alertmanagerMain:
      volumeClaimTemplate:
        spec:
          storageClassName: monitoring-sc
          resources:
            requests:
              storage: 20Gi
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    prometheusOperator:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    grafana:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    k8sPrometheusAdapter:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    kubeStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    telemeterClient:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    openshiftStateMetrics:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    thanosQuerier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""

2.3.6. 用户定义监控设置

进行关于存储和Pod启动节点的设置。

oc apply -f user-monitoring-config.yaml
oc get pod -n openshift-user-workload-monitoring -o wide
### 標準出力↓
NAME                                   READY   STATUS    RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
prometheus-operator-7f6c7bd5dd-svb6k   2/2     Running   0          3h39m   10.130.0.17   master-1   <none>           <none>
prometheus-user-workload-0             5/5     Running   1          3h8m    10.130.2.13   infra-1    <none>           <none>
prometheus-user-workload-1             5/5     Running   1          3h41m   10.129.2.7    infra-2    <none>           <none>
thanos-ruler-user-workload-0           3/3     Running   0          3h9m    10.130.2.14   infra-1    <none>           <none>
thanos-ruler-user-workload-1           3/3     Running   0          3h20m   10.131.2.6    infra-0    <none>           <none>
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheus:
      volumeClaimTemplate:
        spec:
          storageClassName: monitoring-sc
          resources:
            requests:
              storage: 20Gi
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    thanosRuler:
      volumeClaimTemplate:
        spec:
          storageClassName: monitoring-sc
          resources:
            requests:
              storage: 20Gi
      nodeSelector:
        node-role.kubernetes.io/infra: ""

2.3.7. 日志记录设置

关于日志记录,将在创建实例的同时进行设置。尽管支持较短时间的日志记录,但对内存的需求较高,为了验证目的,将限制为4GB。此外,由于Curator已不推荐使用,因此未进行设置。

如果需要长期保存日志,建议将数据移动到第三方存储系统中。OpenShift Logging Elasticsearch实例已经针对短期(约7天)存储进行了优化和测试。

Elasticsearch 是一种内存集中型应用程序。默认情况下,OpenShift 容器平台会安装三个 Elasticsearch 节点,这些节点具有内存需求和限制为 16 GB。在 OpenShift 容器平台的最初三个节点集中,可能没有足够的内存来在集群中运行 Elasticsearch。

在OpenShift Logging 5.0版本中,Elasticsearch Curator被标为不推荐使用,并将在OpenShift Logging 5.1版本中被删除。

# ロギング設定
oc apply -f clo-instance.yaml
oc get pod -n openshift-logging -o wide
### 標準出力↓
NAME                                            READY   STATUS      RESTARTS   AGE     IP            NODE       NOMINATED NODE   READINESS GATES
cluster-logging-operator-6bf898cbd4-9jscd       1/1     Running     0          3h39m   10.128.2.9    worker-1   <none>           <none>
elasticsearch-cdm-885k66by-1-778dddc6dd-zqcwz   2/2     Running     0          152m    10.130.2.30   infra-1    <none>           <none>
elasticsearch-cdm-885k66by-2-66f6dd979f-q698p   2/2     Running     0          150m    10.131.2.26   infra-0    <none>           <none>
elasticsearch-cdm-885k66by-3-98df488c8-6mgsj    2/2     Running     0          148m    10.129.2.18   infra-2    <none>           <none>
elasticsearch-im-app-1620483300-f76lj           0/1     Completed   0          113s    10.130.2.65   infra-1    <none>           <none>
elasticsearch-im-audit-1620483300-qqsnl         0/1     Completed   0          113s    10.130.2.66   infra-1    <none>           <none>
elasticsearch-im-infra-1620483300-ls5l7         0/1     Completed   0          113s    10.130.2.67   infra-1    <none>           <none>
fluentd-6cfrv                                   1/1     Running     0          23h     10.129.2.27   infra-2    <none>           <none>
fluentd-8xfhk                                   1/1     Running     0          23h     10.131.2.23   infra-0    <none>           <none>
fluentd-9z8f4                                   1/1     Running     0          23h     10.131.0.30   worker-0   <none>           <none>
fluentd-gx6jt                                   1/1     Running     0          23h     10.130.0.47   master-1   <none>           <none>
fluentd-kqrpq                                   1/1     Running     0          23h     10.129.0.48   master-2   <none>           <none>
fluentd-mp7tz                                   1/1     Running     0          23h     10.128.2.60   worker-1   <none>           <none>
fluentd-vlwqq                                   1/1     Running     0          23h     10.130.2.38   infra-1    <none>           <none>
fluentd-wlwsl                                   1/1     Running     0          23h     10.128.0.47   master-0   <none>           <none>
kibana-7c584cb9db-4sxp7                         2/2     Running     0          126m    10.128.2.45   infra-1   <none>           <none>
apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance
  namespace: openshift-logging
spec:
  managementState: Managed
  logStore:
    type: elasticsearch
    retentionPolicy:
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3
      storage:
        storageClassName: logging-sc
        size: 40G
      resources:
        limits:
          cpu: 500m
          memory: 4Gi
        requests:
          cpu: 200m
          memory: 4Gi
      redundancyPolicy: SingleRedundancy
      nodeSelector:
        node-role.kubernetes.io/infra: ''
  visualization:
    type: kibana
    kibana:
      replicas: 1
      nodeSelector:
        node-role.kubernetes.io/infra: ''
  collection:
    logs:
      type: fluentd
      fluentd: {}

您可以通过访问以下URL,进入Kibana,并从管理菜单中创建索引模式以查看日志。有3种类型的索引,即app/infra/audit,但要创建索引模式,需要有数据累积。在设置日志记录后,只有infra数据存在。

kibana.PNG

2.3.8. 移动路由器和镜像仓库

将路由器和镜像注册表移动到基础设施节点中。

oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec":{"nodePlacement":{"nodeSelector": {"matchLabels":{"node-role.kubernetes.io/infra":""}}}}}'
oc patch --namespace=openshift-ingress-operator --patch='{"spec": {"replicas": 3}}' --type=merge ingresscontroller/default
oc get pod -n openshift-ingress -o wide
# 標準出力↓
NAME                              READY   STATUS    RESTARTS   AGE     IP               NODE      NOMINATED NODE   READINESS GATES
router-default-5b89fb5765-7wndh   1/1     Running   0          4h16m   192.168.25.117   infra-2   <none>           <none>
router-default-5b89fb5765-smscm   1/1     Running   0          3h56m   192.168.25.115   infra-0   <none>           <none>
router-default-5b89fb5765-whkkl   1/1     Running   0          3h44m   192.168.25.116   infra-1   <none>           <none>
oc patch configs.imageregistry.operator.openshift.io cluster -n openshift-image-registry --type=merge --patch '{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
oc get pod -n openshift-image-registry -o wide
# 標準出力↓
NAME                                               READY   STATUS    RESTARTS   AGE     IP               NODE       NOMINATED NODE   READINESS GATES
cluster-image-registry-operator-768846dbff-7tgjg   1/1     Running   1          4h15m   10.130.0.13      master-1   <none>           <none>
image-registry-74df945c7b-6n5lg                    1/1     Running   0          3h45m   10.129.2.12      infra-2    <none>           <none>
node-ca-czs95                                      1/1     Running   0          4h57m   192.168.25.110   master-0   <none>           <none>
node-ca-hcqgk                                      1/1     Running   0          4h59m   192.168.25.116   infra-1    <none>           <none>
node-ca-j4rzj                                      1/1     Running   0          4h58m   192.168.25.112   master-2   <none>           <none>
node-ca-jcds9                                      1/1     Running   0          4h59m   192.168.25.115   infra-0    <none>           <none>
node-ca-k8sjp                                      1/1     Running   0          4h58m   192.168.25.114   worker-1   <none>           <none>
node-ca-nsrh6                                      1/1     Running   0          4h58m   192.168.25.111   master-1   <none>           <none>
node-ca-rbmw8                                      1/1     Running   0          4h57m   192.168.25.113   worker-0   <none>           <none>
node-ca-rg5hf                                      1/1     Running   0          4h57m   192.168.25.117   infra-2    <none>           <none>

3. 资源情况

3.1 节点列表

ノード状態(ppc64le)_3.PNG

3.2. 资源请求

通过“oc describe nodes”命令确认,我们发现定常状态下的资源请求如下表所示。

ノードcpu
requestsmemory
requestsmaster-01696m7934Mimaster-11695m7986Mimaster-21714m7930Miworker-0349m2712Miworker-1559m3594Miinfra-0819m8777Miinfra-11078m8137Miinfra-2843m8958Mi

3.3. 物理核心使用狀況

使用“lparstat”命令确认后,物理核心的常态使用情况如下表所示。”%idle”表示未使用的物理核心(physc)的比例。换句话说,表示使用了used列分配的核心。

我们对于maste/worker/infra节点都使用虚拟服务器实例进行创建,每个实例的最大使用CPU核心为0.5个,且无上限(共享)。由于设置了最小容量为0.25和最大容量为4.00,因此物理核心会根据负载来分配,范围为0.25至4.00。另外,还采用了SMT8技术。

ノード%idlephyscused
(100-%idle)
/100*physcmaster-086.000.750.11master-185.640.750.11master-290.250.590.06worker-097.740.140.00worker-197.050.190.01infra-087.340.600.08infra-188.820.540.06infra-284.590.700.11

3.4. 内存使用情况

通过使用free命令确认了正常状态下的内存使用情况,如下表所示。单位为GB。
※ 由于主节点的内存使用率较高,因此在此之后停止了所有节点,并将主节点的内存更改为20GB,工作节点的内存更改为12GB。不仅限于主节点,有时也会出现Pod集中在特定节点的情况,因此最好为内存分配提供充足的空间。

ノードtotalusedavailabletotal
– availablemaster-015.8813.151.8514.03master-115.8812.962.0613.83master-215.889.575.4910.40worker-015.884.9510.295.59worker-115.886.149.056.83infra-015.8811.364.4511.43infra-115.8811.004.2811.61infra-215.8812.713.2412.64

3.5. 持续卷需求的使用量

在经过一周后,我从OpenShift控制台中确认了持久卷请求的使用量,这是基于infra节点配置的。在这个时间点上,并没有容量不足的问题。Prometheus的保留期限默认为15天。我设置了Elasticsearch实例的保留期限为7天。如果在OpenShift上没有其他应用程序的话,只要有这么多容量的话,应该不会有不足的问题。

■ 用于监控

永続ボリューム要求容量使用量prometheus-k8s-db-prometheus-k8s-020 GiB7.64 GiBprometheus-k8s-db-prometheus-k8s-120 GiB7.83 GiBalertmanager-main-db-alertmanager-main-020 GiB175 MiBalertmanager-main-db-alertmanager-main-120 GiB175 MiBalertmanager-main-db-alertmanager-main-220 GiB175 MiBprometheus-user-workload-db-prometheus-user-workload-020 GiB177.6 MiBprometheus-user-workload-db-prometheus-user-workload-120 GiB177.6 MiBthanos-ruler-user-workload-data-thanos-ruler-user-workload-020 GiB175 MiBthanos-ruler-user-workload-data-thanos-ruler-user-workload-120 GiB175.1 MiB

■ 记录用

永続ボリューム要求容量使用量elasticsearch-elasticsearch-cdm-885k66by-140 GiB23.92 GiBelasticsearch-elasticsearch-cdm-885k66by-240 GiB23.86 GiBelasticsearch-elasticsearch-cdm-885k66by-340 GiB23.89 GiB
广告
将在 10 秒后关闭
bannerAds