使用Fluent Bit从OpenShift发送日志
首先
我经常看到使用Fluentd在Kubernetes环境中传输日志的信息,但是关于使用比Fluentd更轻量的Fluent Bit,并且在OpenShift环境中的配置,似乎没有太多的信息可供参考,所以我将记录下我实际尝试过的结果。有关Fluent Bit的详细信息,请参考官方网站。
形成
整体构成
我们在Red Hat OpenShift on IBM Cloud上部署了Fluent Bit的Pod,并使用Daemonset配置,将整个OpenShift集群的日志传输到安装在虚拟服务器上的Fluent Bit中进行验证。
详细的构成信息 de
OpenShift(Red Hat OpenShift on IBM Cloud):版本号4.5.24_1527
有3个Worker节点(之前在这里的文章中提到的环境)
Fluent Bit:版本号1.6
虚拟服务器:RHEL7.9
尝试制作
虚拟服务器方(接收方)
请按照官方网站上的步骤进行安装,并启动服务。
设定信息在这里。仅有输入/输出部分已从默认值进行了更改。
[SERVICE]
flush 5
daemon Off
log_level info
parsers_file parsers.conf
plugins_file plugins.conf
http_server Off
http_listen 0.0.0.0
http_port 2020
storage.metrics on
[INPUT]
Name forward
Port 24225
Buffer_Chunk_Size 32MB
Buffer_Max_Size 64MB
[OUTPUT]
name file
match *
path /data/log/td-agent-bit # ディレクトリ配下にタグ名のファイルが出力されます。
※端口、缓冲区大小等没有特别的意图。太小的大小会导致错误。
打开Shift的一侧(发送方)
在进行此次验证时,我们参考了以下GitHub中的信息:
fluent/fluent-bit-kubernetes-logging
fluent/fluentd-kubernetes-daemonset
为了部署Fluent Bit的Pod,需要创建以下七个资源:
– 命名空间
– 服务账户
– 集群角色
– 集群角色绑定
– 安全上下文约束
– 配置映射
– 守护进程集群
以下是每个设置和YAML文件。
– 命名空间
我们使用oc new-project fluentbittest命令创建了名为fluentbittest的命名空间。
- ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: fluentbittest
- ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
- ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-read
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: fluentbittest
- SecurityContextConstraints
这是OpenShift独有的特权设置,在上述的GitHub中也有介绍。
kind: SecurityContextConstraints
apiVersion: security.openshift.io/v1
metadata:
name: fluentbittest
allowPrivilegedContainer: true
allowHostNetwork: true
allowHostDirVolumePlugin: true
priority:
allowedCapabilities: []
allowHostPorts: true
allowHostPID: true
allowHostIPC: true
readOnlyRootFilesystem: false
requiredDropCapabilities: []
defaultAddCapabilities: []
runAsUser:
type: RunAsAny
seLinuxContext:
type: MustRunAs
fsGroup:
type: MustRunAs
supplementalGroups:
type: RunAsAny
volumes:
- configMap
- downwardAPI
- emptyDir
- hostPath
- persistentVolumeClaim
- projected
- secret
users:
- system:serviceaccount:fluentbittest:builder
- system:serviceaccount:fluentbittest:default
- system:serviceaccount:fluentbittest:deployer
- system:serviceaccount:fluentbittest:fluent-bit
- ConfigMap
由于OpenShift的容器运行时使用了cri-o,因此请在[输入]的解析器中指定cri。有关cri配置的详细信息,请查看[解析器]中的cri部分。
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: fluentbittest
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-forward.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser cri
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
output-forward.conf: |
[OUTPUT]
Name forward
Match *
Host ${FLUENT_FOWARD_HOST}
Port ${FLUENT_FOWARD_PORT}
Retry_Limit False
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
# http://rubular.com/r/tjUt3Awgg4
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
- DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: fluentbittest
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: fluent-bit-logging
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:1.6
imagePullPolicy: Always
ports:
- containerPort: 2020
env:
- name: FLUENT_FOWARD_HOST
value: "X.X.X.X"
- name: FLUENT_FOWARD_PORT
value: "24225"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
securityContext:
privileged: true
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
※本文中仍然存在有关 Docker 的设置。
※如果不将下述的 SecurityContext 设置放入上述的 Daemonset 中,Fluent Bit Pod 将无法读取节点上的日志文件并会导致错误,请注意。(参考)。
securityContext:
privileged: true
按照上述资源的顺序,使用”oc create -f “命令创建,将会启动Pod。
$ oc get pod
NAME READY STATUS RESTARTS AGE
fluent-bit-7wj72 1/1 Running 0 53s
fluent-bit-97z7r 1/1 Running 0 53s
fluent-bit-w26z5 1/1 Running 0 53s
确认收到
我们将检查虚拟服务器是否已将传输的日志输出到文件中。
$ ls -l /data/log/td-agent-bit/
total 80592
-rw-r--r-- 1 root root 1477 Jan 25 20:18 kube.var.log.containers.calico-kube-controllers-6c4d9c955b-k8w6d_calico-system_calico-kube-controllers-355b03ba53e7dea6461944eb616be24ae6817d21acec8133e9ff6a845fc7b954.log
-rw-r--r-- 1 root root 164785 Jan 25 20:19 kube.var.log.containers.calico-node-lgrns_calico-system_calico-node-592a8f869fcf0460fd01c0114ea8c558e6e8167d92c3e0d59a609d0abb72be2e.log
-rw-r--r-- 1 root root 171322 Jan 25 20:19 kube.var.log.containers.calico-node-wjsbn_calico-system_calico-node-a8cfa55b9431bd668bcf744ed125d29747ae3c454d2c6c292556789822b79a5f.log
-rw-r--r-- 1 root root 175063 Jan 25 20:19 kube.var.log.containers.calico-node-xdxjb_calico-system_calico-node-eb24febe3c3d651e39f941fac47f70f45664d83a31371bc774c25f61253347cf.log
-rw-r--r-- 1 root root 1128 Jan 25 20:17 kube.var.log.containers.calico-typha-5c8d96f77d-sfltp_calico-system_calico-typha-d70a3dde704772db3015d654db936e4b9b1902f8312228836dcb99e17d64959e.log
:
(後略)
检查日志文件的内容(查看最上面显示的日志)。
[user@vsi ~]$ cat /data/log/td-agent-bit/kube.var.log.containers.calico-kube-controllers-6c4d9c955b-k8w6d_calico-system_calico-kube-controllers-355b03ba53e7dea6461944eb616be24ae6817d21acec8133e9ff6a845fc7b954.log
kube.var.log.containers.calico-kube-controllers-6c4d9c955b-k8w6d_calico-system_calico-kube-controllers-355b03ba53e7dea6461944eb616be24ae6817d21acec8133e9ff6a845fc7b954.log: [1611627483.220251215, {"stream":"stderr","logtag":"F","message":"2021-01-26 02:18:03.220 [INFO][1] watchercache.go 96: Watch channel closed by remote - recreate watcher ListRoot=\"/calico/resources/v3/projectcalico.org/nodes\"","kubernetes":{"pod_name":"calico-kube-controllers-6c4d9c955b-k8w6d","namespace_name":"calico-system","pod_id":"a25ffa7a-ecf7-4140-8e65-4e4f174e80f6","labels":{"k8s-app":"calico-kube-controllers","pod-template-hash":"6c4d9c955b"},"annotations":{"cni.projectcalico.org/podIP":"172.17.59.83/32","cni.projectcalico.org/podIPs":"172.17.59.83/32","k8s.v1.cni.cncf.io/network-status":"[{\n \"name\": \"k8s-pod-network\",\n \"ips\": [\n \"172.17.59.83\"\n ],\n \"default\": true,\n \"dns\": {}\n}]","k8s.v1.cni.cncf.io/networks-status":"[{\n \"name\": \"k8s-pod-network\",\n \"ips\": [\n \"172.17.59.83\"\n ],\n \"default\": true,\n \"dns\": {}\n}]"},"host":"10.240.0.4","container_name":"calico-kube-controllers","docker_id":"355b03ba53e7dea6461944eb616be24ae6817d21acec8133e9ff6a845fc7b954","container_hash":"registry.ng.bluemix.net/armada-master/calico/kube-controllers@sha256:eb456f071b19614a6a4eb149cf1eb2dc924780accc2f9b426305790e03c2403b","container_image":"registry.ng.bluemix.net/armada-master/calico/kube-controllers:v3.16.5"}}]
[user@vsi ~]$
我已确认日志已在OpenShift上传输。
赠品
因为听说Fluent Bit非常轻量,所以我特意安装了它,并与Fluentd进行了比较,想看看它的轻量性。
- 比較用の環境
我们使用fluent/fluentd-kubernetes-daemonset构建了Fluentd。为了将OpenShift上的日志传输到同一虚拟服务器上,我们将比较Fluentd Pod的资源使用情况与Fluent Bit在相同环境中的验证环境。
$ oc get pod -n logtest
NAME READY STATUS RESTARTS AGE
fluentd-2cbqz 1/1 Running 0 5d19h
fluentd-4mh8f 1/1 Running 0 5d19h
fluentd-77h8b 1/1 Running 0 5d19h
这个Fluentd插件已经加入了一些过滤处理,所以虽然不是完全准确的比较,但应该可以作一个大致的比较。
以下是结果。
- 結果:Fluentd
$ oc adm top po -n logtest
NAME CPU(cores) MEMORY(bytes)
fluentd-2cbqz 12m 112Mi
fluentd-4mh8f 15m 112Mi
fluentd-77h8b 15m 119Mi
- 結果:Fluent Bit
$ oc adm top po -n fluentbittest
NAME CPU(cores) MEMORY(bytes)
fluent-bit-7wj72 8m 9Mi
fluent-bit-97z7r 4m 8Mi
fluent-bit-w26z5 3m 7Mi
结果来看,与Fluentd相比,Fluent Bit的CPU使用量约为1/2以下,内存使用量约为1/10以下。(由于无法进行准确比较,所以这只是一个大致的表述。。。)
如上所述,尽管Fluentd的处理内容稍微多一些,但其CPU使用量大约只有1/2左右,内存使用量也只有1/10左右,这确实很吸引人。特别是在OpenShift/Kubernetes集群环境中,由于日志数量非常庞大,日志收集器的处理压力也很大。本次配置只是简单地通过forward设置转发日志,因此如果要在生产环境中使用,需要考虑进行各种调整和验证。