Prometheus:使用Telegraf将数据存储到InfluxDB并在Grafana中进行可视化
执行环境:
[root@testhost ~]# uname -a
Linux testhost 4.18.0-448.el8.x86_64 #1 SMP Wed Jan 18 15:02:46 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
[root@testhost ~]# cat /etc/redhat-release
CentOS Stream release 8
[root@testhost ~]# yum list installed | grep telegraf
telegraf.x86_64 1.28.3-1 @@commandline
[root@testhost ~]# yum list installed | grep influxdb
influxdb2.x86_64 2.7.3-1 @@commandline
[root@testhost ~]# yum list installed | grep grafana
grafana-enterprise.x86_64 10.0.3-1 @@commandline
node_exporter 版本:1.6.1.linux-amd64
0. 概述
在能够监控系统性能信息的软件中,存在着各种各样的选择。
其中之一就是 Prometheus,它备受关注作为适用于云环境的软件。
Promtheus
然而,这个 Prometheus 并不太适合长期数据保存。
如果想要长期保存获取到的数据,就需要将数据存储到另一个数据库中。
データベースにも様々なものがありますが、その1つが時系列データに特化したデータベースである InfluxDB です。
今回はこの InfluxDB に、 Prometheus から取り込んだデータを格納しようと思います。
InfluxDB
ところが、 Prometheus と InfluxDB の接続は、古いバージョンではサポートされていたのですが、最新のバージョンではサポートされなくなってしまっています。
ではあきらめるしかないのかというと、実はいくつかの方法が残されています。
そのうちの1つが、 InfluxDB 向けに作成されたメトリクス収集エージェントである Telegraf を用いる方法です。
电报
由于Telegraf支持作为获取数据的源头,因此通过使用Telegraf可以间接地将Prometheus和InfluxDB连接起来。
今回は Telegraf を使って Prometheus で取得した情報を InfluxDB に格納し、ついでに格納したデータを Grafana で可視化することを目指します。
Grafana
1. 預先準備
Prometheus のインストールについては、以下の過去記事を参照してください。
なお、必要なのは「 node_exporter 」だけで、他のコンポーネントは不要です。
尝试免费安装Linux系统监控软件“Prometheus”
请参考以下过去文章,获取有关 InfluxDB 和 Grafana 安装的信息。
Linux: データ可視化ソフト「Grafana」を無料インストールしてみた+「Prometheus」と連携させてみた
Linux: 我试着免费安装了专注于时间系列数据的数据库”InfluxDB”。
请参考以下过去文章了解关于Telegraf安装和与InfluxDB的集成。
InfluxDB: 从Telegraf获取数据。
请参考以下有关 InfluxDB 和 Grafana 集成的过去文章。
InfluxDB: データをGrafanaで可視化してみる
另外,本文的前提是所有软件都在同一台服务器上运行。
同时,设置方面将遵循这些过去文章的规定。
2. 设置
今回必要となるのは、 Telegraf の設定変更です。
設定変更前に、 Telegraf はいったん停止させておいてください。
Telegraf の設定ファイル /etc/telegraf/telegraf.conf の中で、 Prometheus に関する箇所は以下の通りです。
# # Read metrics from one or many prometheus clients
# [[inputs.prometheus]]
# ## An array of urls to scrape metrics from.
# urls = ["http://localhost:9100/metrics"]
#
# ## Metric version controls the mapping from Prometheus metrics into Telegraf metrics.
# ## See "Metric Format Configuration" in plugins/inputs/prometheus/README.md for details.
# ## Valid options: 1, 2
# # metric_version = 1
#
# ## Url tag name (tag containing scrapped url. optional, default is "url")
# # url_tag = "url"
#
# ## Whether the timestamp of the scraped metrics will be ignored.
# ## If set to true, the gather time will be used.
# # ignore_timestamp = false
#
# ## An array of Kubernetes services to scrape metrics from.
# # kubernetes_services = ["http://my-service-dns.my-namespace:9100/metrics"]
#
# ## Kubernetes config file to create client from.
# # kube_config = "/path/to/kubernetes.config"
#
# ## Scrape Pods
# ## Enable scraping of k8s pods. Further settings as to which pods to scape
# ## are determiend by the 'method' option below. When enabled, the default is
# ## to use annotations to determine whether to scrape or not.
# # monitor_kubernetes_pods = false
#
# ## Scrape Pods Method
# ## annotations: default, looks for specific pod annotations documented below
# ## settings: only look for pods matching the settings provided, not
# ## annotations
# ## settings+annotations: looks at pods that match annotations using the user
# ## defined settings
# # monitor_kubernetes_pods_method = "annotations"
#
# ## Scrape Pods 'annotations' method options
# ## If set method is set to 'annotations' or 'settings+annotations', these
# ## annotation flags are looked for:
# ## - prometheus.io/scrape: Required to enable scraping for this pod. Can also
# ## use 'prometheus.io/scrape=false' annotation to opt-out entirely.
# ## - prometheus.io/scheme: If the metrics endpoint is secured then you will
# ## need to set this to 'https' & most likely set the tls config
# ## - prometheus.io/path: If the metrics path is not /metrics, define it with
# ## this annotation
# ## - prometheus.io/port: If port is not 9102 use this annotation
#
# ## Scrape Pods 'settings' method options
# ## When using 'settings' or 'settings+annotations', the default values for
# ## annotations can be modified using with the following options:
# # monitor_kubernetes_pods_scheme = "http"
# # monitor_kubernetes_pods_port = "9102"
# # monitor_kubernetes_pods_path = "/metrics"
#
# ## Get the list of pods to scrape with either the scope of
# ## - cluster: the kubernetes watch api (default, no need to specify)
# ## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
# # pod_scrape_scope = "cluster"
#
# ## Only for node scrape scope: node IP of the node that telegraf is running on.
# ## Either this config or the environment variable NODE_IP must be set.
# # node_ip = "10.180.1.1"
#
# ## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
# ## Default is 60 seconds.
# # pod_scrape_interval = 60
#
# ## Restricts Kubernetes monitoring to a single namespace
# ## ex: monitor_kubernetes_pods_namespace = "default"
# # monitor_kubernetes_pods_namespace = ""
# ## The name of the label for the pod that is being scraped.
# ## Default is 'namespace' but this can conflict with metrics that have the label 'namespace'
# # pod_namespace_label_name = "namespace"
# # label selector to target pods which have the label
# # kubernetes_label_selector = "env=dev,app=nginx"
# # field selector to target pods
# # eg. To scrape pods on a specific node
# # kubernetes_field_selector = "spec.nodeName=$HOSTNAME"
#
# ## Filter which pod annotations and labels will be added to metric tags
# #
# # pod_annotation_include = ["annotation-key-1"]
# # pod_annotation_exclude = ["exclude-me"]
# # pod_label_include = ["label-key-1"]
# # pod_label_exclude = ["exclude-me"]
#
# # cache refresh interval to set the interval for re-sync of pods list.
# # Default is 60 minutes.
# # cache_refresh_interval = 60
#
# ## Scrape Services available in Consul Catalog
# # [inputs.prometheus.consul]
# # enabled = true
# # agent = "http://localhost:8500"
# # query_interval = "5m"
#
# # [[inputs.prometheus.consul.query]]
# # name = "a service name"
# # tag = "a service tag"
# # url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
# # [inputs.prometheus.consul.query.tags]
# # host = "{{.Node}}"
#
# ## Use bearer token for authorization. ('bearer_token' takes priority)
# # bearer_token = "/path/to/bearer/token"
# ## OR
# # bearer_token_string = "abc_123"
#
# ## HTTP Basic Authentication username and password. ('bearer_token' and
# ## 'bearer_token_string' take priority)
# # username = ""
# # password = ""
#
# ## Optional custom HTTP headers
# # http_headers = {"X-Special-Header" = "Special-Value"}
#
# ## Specify timeout duration for slower prometheus clients (default is 5s)
# # timeout = "5s"
#
# ## deprecated in 1.26; use the timeout option
# # response_timeout = "5s"
#
# ## HTTP Proxy support
# # use_system_proxy = false
# # http_proxy_url = ""
#
# ## Optional TLS Config
# # tls_ca = /path/to/cafile
# # tls_cert = /path/to/certfile
# # tls_key = /path/to/keyfile
#
# ## Use TLS but skip chain & host verification
# # insecure_skip_verify = false
#
# ## Use the given name as the SNI server name on each URL
# # tls_server_name = "myhost.example.org"
#
# ## TLS renegotiation method, choose from "never", "once", "freely"
# # tls_renegotiation_method = "never"
#
# ## Enable/disable TLS
# ## Set to true/false to enforce TLS being enabled/disabled. If not set,
# ## enable TLS only if any of the other options are specified.
# # tls_enable = true
#
# ## Control pod scraping based on pod namespace annotations
# ## Pass and drop here act like tagpass and tagdrop, but instead
# ## of filtering metrics they filters pod candidates for scraping
# #[inputs.prometheus.namespace_annotation_pass]
# # annotation_key = ["value1", "value2"]
# #[inputs.prometheus.namespace_annotation_drop]
# # some_annotation_key = ["dont-scrape"]
虽然非常长,但重要的是前两行。
首先,取消[[inputs.prometheus]]的注释。
然后取消urls的注释,并指定Prometheus(node_exporter)的URL。
由于本次所有软件都在同一台服务器上运行,所以默认设置即可。
# # Read metrics from one or many prometheus clients
[[inputs.prometheus]]
# ## An array of urls to scrape metrics from.
urls = ["http://localhost:9100/metrics"]
#
更改设置后,重新启动Telegraf。
确定
InfluxDB の Web 画面から以下の条件で検索して、 Prometheus で取得した情報が格納されていることを確認します。
来源:test_bucket
过滤器:_measurement = node_cpu_seconds_total
搜索时间段:过去5分钟
我确认了情报已经被正确储存。
如果 InfluxDB 和 Grafana 已经连接在一起,您可以通过 Grafana 确认相同的信息。
点击「脚本编辑器」按钮,确认查询语句,并将其复制到 Grafana 的网页上。
在 Grafana 上,选择与上述相同的搜索时间段,选择「过去5分钟」。
from(bucket: "test_bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "node_cpu_seconds_total")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
我也可以在Grafana上确认到信息。