Prometheus：使用Telegraf将数据存储到InfluxDB并在Grafana中进行可视化

3 年 ago

科, 雅

6 minutes

执行环境：

[root@testhost ~]# uname -a
Linux testhost 4.18.0-448.el8.x86_64 #1 SMP Wed Jan 18 15:02:46 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
[root@testhost ~]# cat /etc/redhat-release
CentOS Stream release 8
[root@testhost ~]# yum list installed | grep telegraf
telegraf.x86_64                                    1.28.3-1                                                   @@commandline
[root@testhost ~]# yum list installed | grep influxdb
influxdb2.x86_64                                   2.7.3-1                                                    @@commandline
[root@testhost ~]# yum list installed | grep grafana
grafana-enterprise.x86_64                          10.0.3-1                                                   @@commandline

node_exporter 版本：1.6.1.linux-amd64

0. 概述

在能够监控系统性能信息的软件中，存在着各种各样的选择。
其中之一就是 Prometheus，它备受关注作为适用于云环境的软件。

Promtheus

然而，这个 Prometheus 并不太适合长期数据保存。
如果想要长期保存获取到的数据，就需要将数据存储到另一个数据库中。

データベースにも様々なものがありますが、その1つが時系列データに特化したデータベースである InfluxDB です。
今回はこの InfluxDB に、 Prometheus から取り込んだデータを格納しようと思います。

InfluxDB

ところが、 Prometheus と InfluxDB の接続は、古いバージョンではサポートされていたのですが、最新のバージョンではサポートされなくなってしまっています。

ではあきらめるしかないのかというと、実はいくつかの方法が残されています。
そのうちの1つが、 InfluxDB 向けに作成されたメトリクス収集エージェントである Telegraf を用いる方法です。

电报

由于Telegraf支持作为获取数据的源头，因此通过使用Telegraf可以间接地将Prometheus和InfluxDB连接起来。

今回は Telegraf を使って Prometheus で取得した情報を InfluxDB に格納し、ついでに格納したデータを Grafana で可視化することを目指します。

Grafana

1. 預先準備

Prometheus のインストールについては、以下の過去記事を参照してください。
なお、必要なのは「 node_exporter 」だけで、他のコンポーネントは不要です。

尝试免费安装Linux系统监控软件“Prometheus”

请参考以下过去文章，获取有关 InfluxDB 和 Grafana 安装的信息。

Linux: データ可視化ソフト「Grafana」を無料インストールしてみた＋「Prometheus」と連携させてみた

Linux: 我试着免费安装了专注于时间系列数据的数据库”InfluxDB”。

请参考以下过去文章了解关于Telegraf安装和与InfluxDB的集成。

InfluxDB: 从Telegraf获取数据。

请参考以下有关 InfluxDB 和 Grafana 集成的过去文章。

InfluxDB: データをGrafanaで可視化してみる

另外，本文的前提是所有软件都在同一台服务器上运行。
同时，设置方面将遵循这些过去文章的规定。

2. 设置

今回必要となるのは、 Telegraf の設定変更です。
設定変更前に、 Telegraf はいったん停止させておいてください。

Telegraf の設定ファイル /etc/telegraf/telegraf.conf の中で、 Prometheus に関する箇所は以下の通りです。

# # Read metrics from one or many prometheus clients
# [[inputs.prometheus]]
#   ## An array of urls to scrape metrics from.
#   urls = ["http://localhost:9100/metrics"]
#
#   ## Metric version controls the mapping from Prometheus metrics into Telegraf metrics.
#   ## See "Metric Format Configuration" in plugins/inputs/prometheus/README.md for details.
#   ## Valid options: 1, 2
#   # metric_version = 1
#
#   ## Url tag name (tag containing scrapped url. optional, default is "url")
#   # url_tag = "url"
#
#   ## Whether the timestamp of the scraped metrics will be ignored.
#   ## If set to true, the gather time will be used.
#   # ignore_timestamp = false
#
#   ## An array of Kubernetes services to scrape metrics from.
#   # kubernetes_services = ["http://my-service-dns.my-namespace:9100/metrics"]
#
#   ## Kubernetes config file to create client from.
#   # kube_config = "/path/to/kubernetes.config"
#
#   ## Scrape Pods
#   ## Enable scraping of k8s pods. Further settings as to which pods to scape
#   ## are determiend by the 'method' option below. When enabled, the default is
#   ## to use annotations to determine whether to scrape or not.
#   # monitor_kubernetes_pods = false
#
#   ## Scrape Pods Method
#   ## annotations: default, looks for specific pod annotations documented below
#   ## settings: only look for pods matching the settings provided, not
#   ##   annotations
#   ## settings+annotations: looks at pods that match annotations using the user
#   ##   defined settings
#   # monitor_kubernetes_pods_method = "annotations"
#
#   ## Scrape Pods 'annotations' method options
#   ## If set method is set to 'annotations' or 'settings+annotations', these
#   ## annotation flags are looked for:
#   ## - prometheus.io/scrape: Required to enable scraping for this pod. Can also
#   ##     use 'prometheus.io/scrape=false' annotation to opt-out entirely.
#   ## - prometheus.io/scheme: If the metrics endpoint is secured then you will
#   ##     need to set this to 'https' & most likely set the tls config
#   ## - prometheus.io/path: If the metrics path is not /metrics, define it with
#   ##     this annotation
#   ## - prometheus.io/port: If port is not 9102 use this annotation
#
#   ## Scrape Pods 'settings' method options
#   ## When using 'settings' or 'settings+annotations', the default values for
#   ## annotations can be modified using with the following options:
#   # monitor_kubernetes_pods_scheme = "http"
#   # monitor_kubernetes_pods_port = "9102"
#   # monitor_kubernetes_pods_path = "/metrics"
#
#   ## Get the list of pods to scrape with either the scope of
#   ## - cluster: the kubernetes watch api (default, no need to specify)
#   ## - node: the local cadvisor api; for scalability. Note that the config node_ip or the environment variable NODE_IP must be set to the host IP.
#   # pod_scrape_scope = "cluster"
#
#   ## Only for node scrape scope: node IP of the node that telegraf is running on.
#   ## Either this config or the environment variable NODE_IP must be set.
#   # node_ip = "10.180.1.1"
#
#   ## Only for node scrape scope: interval in seconds for how often to get updated pod list for scraping.
#   ## Default is 60 seconds.
#   # pod_scrape_interval = 60
#
#   ## Restricts Kubernetes monitoring to a single namespace
#   ##   ex: monitor_kubernetes_pods_namespace = "default"
#   # monitor_kubernetes_pods_namespace = ""
#   ## The name of the label for the pod that is being scraped.
#   ## Default is 'namespace' but this can conflict with metrics that have the label 'namespace'
#   # pod_namespace_label_name = "namespace"
#   # label selector to target pods which have the label
#   # kubernetes_label_selector = "env=dev,app=nginx"
#   # field selector to target pods
#   # eg. To scrape pods on a specific node
#   # kubernetes_field_selector = "spec.nodeName=$HOSTNAME"
#
#   ## Filter which pod annotations and labels will be added to metric tags
#   #
#   # pod_annotation_include = ["annotation-key-1"]
#   # pod_annotation_exclude = ["exclude-me"]
#   # pod_label_include = ["label-key-1"]
#   # pod_label_exclude = ["exclude-me"]
#
#   # cache refresh interval to set the interval for re-sync of pods list.
#   # Default is 60 minutes.
#   # cache_refresh_interval = 60
#
#   ## Scrape Services available in Consul Catalog
#   # [inputs.prometheus.consul]
#   #   enabled = true
#   #   agent = "http://localhost:8500"
#   #   query_interval = "5m"
#
#   #   [[inputs.prometheus.consul.query]]
#   #     name = "a service name"
#   #     tag = "a service tag"
#   #     url = 'http://{{if ne .ServiceAddress ""}}{{.ServiceAddress}}{{else}}{{.Address}}{{end}}:{{.ServicePort}}/{{with .ServiceMeta.metrics_path}}{{.}}{{else}}metrics{{end}}'
#   #     [inputs.prometheus.consul.query.tags]
#   #       host = "{{.Node}}"
#
#   ## Use bearer token for authorization. ('bearer_token' takes priority)
#   # bearer_token = "/path/to/bearer/token"
#   ## OR
#   # bearer_token_string = "abc_123"
#
#   ## HTTP Basic Authentication username and password. ('bearer_token' and
#   ## 'bearer_token_string' take priority)
#   # username = ""
#   # password = ""
#
#   ## Optional custom HTTP headers
#   # http_headers = {"X-Special-Header" = "Special-Value"}
#
#   ## Specify timeout duration for slower prometheus clients (default is 5s)
#   # timeout = "5s"
#
#   ## deprecated in 1.26; use the timeout option
#   # response_timeout = "5s"
#
#   ## HTTP Proxy support
#   # use_system_proxy = false
#   # http_proxy_url = ""
#
#   ## Optional TLS Config
#   # tls_ca = /path/to/cafile
#   # tls_cert = /path/to/certfile
#   # tls_key = /path/to/keyfile
#
#   ## Use TLS but skip chain & host verification
#   # insecure_skip_verify = false
#
#   ## Use the given name as the SNI server name on each URL
#   # tls_server_name = "myhost.example.org"
#
#   ## TLS renegotiation method, choose from "never", "once", "freely"
#   # tls_renegotiation_method = "never"
#
#   ## Enable/disable TLS
#   ## Set to true/false to enforce TLS being enabled/disabled. If not set,
#   ## enable TLS only if any of the other options are specified.
#   # tls_enable = true
#
#   ## Control pod scraping based on pod namespace annotations
#   ## Pass and drop here act like tagpass and tagdrop, but instead
#   ## of filtering metrics they filters pod candidates for scraping
#   #[inputs.prometheus.namespace_annotation_pass]
#   # annotation_key = ["value1", "value2"]
#   #[inputs.prometheus.namespace_annotation_drop]
#   # some_annotation_key = ["dont-scrape"]

虽然非常长，但重要的是前两行。
首先，取消[[inputs.prometheus]]的注释。
然后取消urls的注释，并指定Prometheus（node_exporter）的URL。
由于本次所有软件都在同一台服务器上运行，所以默认设置即可。

# # Read metrics from one or many prometheus clients
[[inputs.prometheus]]
#   ## An array of urls to scrape metrics from.
urls = ["http://localhost:9100/metrics"]
#

更改设置后，重新启动Telegraf。

确定

InfluxDB の Web 画面から以下の条件で検索して、 Prometheus で取得した情報が格納されていることを確認します。

来源：test_bucket
过滤器：_measurement = node_cpu_seconds_total
搜索时间段：过去5分钟

我确认了情报已经被正确储存。

如果 InfluxDB 和 Grafana 已经连接在一起，您可以通过 Grafana 确认相同的信息。
点击「脚本编辑器」按钮，确认查询语句，并将其复制到 Grafana 的网页上。
在 Grafana 上，选择与上述相同的搜索时间段，选择「过去5分钟」。

from(bucket: "test_bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "node_cpu_seconds_total")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")

我也可以在Grafana上确认到信息。