使用Grafana Agent进行持续性性能分析（Go语言拉取编程）

3 年 ago

文, 翔

3 minutes

首先

有一个名为Grafana Pyroscope的产品，用于持续性的存储和利用性能分析信息。该产品提供了使用Pyroscope SDK和Grafana Agent的方法来存储性能分析信息，如下图所示。

ref https://grafana.com/docs/pyroscope/latest/configure-client/#sending-profiles-from-your-application

最初它是作为一个独立项目名为Pyroscope存在的，但当它成为Grafana Labs的产品时，与Grafana Phlare整合在一起，变成了现在的Grafana Pyroscope。

Grafana Pyrscopeでプロファイル情報を集めるやり方としては専用のSDK(Pyroscope SDK)を利用するのが基本的な使い方になります。その際には対象のアプリケーションに変更を加える必要があります。

そこで、対象のアプリケーションのソースコードに変更を加えることなくGrafana Agentを用いてプロファイルを収集する機能(Auto-Instrumentation)が追加されました。

本次我们将介绍如何使用Grafana Agent，它能够在不添加源代码的情况下收集应用程序的配置文件信息。

我个人认为，目前介绍的功能在某种程度上给人一种实验性的印象，但在特定情况下可能会有用。

使用Grafana Agent收集配置文件。

収集対象はCPUプロファイリングのみ

BPF_PROG_TYPE_PERF_EVENTが有効になっているkernel version >= 4.9で利用可能
Python、Ruby、JavaScriptのインタプリタ言語の情報も収集できるが、取得できる情報が限定的

golang pull

収集対象はGo言語のみ
Grafana Agentからpprofのエンドポイントへ接続できる場合に利用可能
取得対象はpprofの情報になるので、CPU以外にもメモリ等の情報も収集可能

在这篇文章中，我将介绍如何使用golang pull的方法。
有关eBPF的更多信息，请参阅此文章。

如何配置Grafana Agent。

行動必須符合條件

Grafana AgentからGo言語で書かれたアプリケーションのpprofのエンドポイントに接続できること

我认为与使用eBPF的方法相比，由于要求较简单，因此更容易引入。

尝试在Kubernetes上运行

確認したバージョン

Grafana Agent: v0.38.0
Grafana Pyroscope: 1.2.0

pprofの情報の収集イメージ

Grafana Agetntの設定について説明する前に、今回はどういう条件でGoのアプリケーションからpprofの情報をGrafana Agentで収集していくのかについて話をします。

スクレイプの設定についてはそれなりに自由度があるのでいろんな方法で設定できますが、今回は以下のようなアノテーションがPodについているときに対象のpprofの情報を収集して、Grafana Pyroscopeに蓄積していくことを考えます。

annotations:
    profiles.grafana.com/cpu.scrape: "true"
    profiles.grafana.com/goroutine.scrape: "true"
    profiles.grafana.com/memory.scrape: "true"
    profiles.grafana.com/fgprof.scrape: "true"
    profiles.grafana.com/block.scrape: "true"
    profiles.grafana.com/mutex.scrape: "true"

いろんな設定方法が考えられる中で上記についてのやり方を説明する理由は2点になります。

Grafana Agentのスクレイプに関する部分の参考になっているPrometheusで似たような考え方で利用されているケースがそれなりに多いのと、Grafana Agentの該当機能の利用が仮に活発になったと想定した場合にも同じような設定方法で運用するケースが多くなるかなと個人的に予想している

なので、PodかServiceに該当のアノテーションが付与されているものをスクレイプ対象としてpprofの情報を収集していくような運用の仕方が多くなるのかなと思っています。

Grafana Agentの設定ファイル

## 収集した　pprof の情報を書き込む Grafana Pyroscope の接続先を設定します
pyroscope.write "pyroscope_write" {
  endpoint {
    url = "http://pyroscope.observability.svc.cluster.local.:4040"
  }
}

## Pod の情報を収集するようにここで設定します
discovery.kubernetes "pyroscope_kubernetes" {
  role = "pod"
}

## 収集した pprof の情報に対して、Pod 名などの Kubernetes 上のメタ情報を加工してラベルの中に追加したりします
## ここで Pod 名とか Container 名の情報を追加しておかないと、せっかく集めた pprof の情報が解析とかに使えなくなるので実施します
discovery.relabel "kubernetes_pods" {
  targets = concat(discovery.kubernetes.pyroscope_kubernetes.targets)

  rule {
    action        = "drop"
    source_labels = ["__meta_kubernetes_pod_phase"]
    regex         = "Pending|Succeeded|Failed|Completed"
  }

  rule {
    action = "labelmap"
    regex  = "__meta_kubernetes_pod_label_(.+)"
  }

  rule {
    action        = "replace"
    source_labels = ["__meta_kubernetes_namespace"]
    target_label  = "namespace"
  }

  rule {
    action        = "replace"
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label  = "pod"
  }

  rule {
    action        = "replace"
    source_labels = ["__meta_kubernetes_pod_container_name"]
    target_label  = "container"
  }
}

## ここで Pod のアノテーションに対応してスクレイプするようにする処理を追加します
## この例は`profiles.grafana.com/memory.scrape`とかに対応した処理になります
## あんまり説明が書かれてないので初めての方にはここがすごく難しいんですが、Grafana Agent上では`__meta_`に`.`や`/`が`_`に置換された状態でアノテーションとかの情報が入っていて、こういう形で操作できます
discovery.relabel "kubernetes_pods_memory_custom_name" {
  targets = concat(discovery.relabel.kubernetes_pods.output)

  rule {
    source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_scrape"]
    action        = "keep"
    regex         = "true"
  }

  rule {
    source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_port_name"]
    action        = "drop"
    regex         = ""
  }

  rule {
    source_labels = ["__meta_kubernetes_pod_container_port_name"]
    target_label  = "__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_port_name"
    action        = "keepequal"
  }

  rule {
    source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_scheme"]
    action        = "replace"
    regex         = "(https?)"
    target_label  = "__scheme__"
    replacement   = "$1"
  }

  rule {
    source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_path"]
    action        = "replace"
    regex         = "(.+)"
    target_label  = "__profile_path__"
    replacement   = "$1"
  }

  rule {
    source_labels = ["__address__", "__meta_kubernetes_pod_annotation_profiles_grafana_com_memory_port"]
    action        = "replace"
    regex         = "(.+?)(?::\\d+)?;(\\d+)"
    target_label  = "__address__"
    replacement   = "$1:$2"
  }
}

## 最後に今まで設定してきた Pod の条件と書き込み先の Grafana Pyroscope の情報をマッピングします
## このタイミングで pprof の上のうちにどの情報をスクレイプするかを設定するのかを設定します
## ここで memory に対してのみ `ture` に設定してるので、アノテーションで profiles.grafana.com/memory.scrape: "true" で設定している Pod の memory の情報だけを収集するようになります
pyroscope.scrape "pyroscope_scrape_memory" {
  clustering {
    enabled = true
  }

  targets    = concat(discovery.relabel.kubernetes_pods_memory_default_name.output, discovery.relabel.kubernetes_pods_memory_custom_name.output)
  forward_to = [pyroscope.write.pyroscope_write.receiver]

  profiling_config {
    profile.memory {
      enabled = true
    }

    profile.process_cpu {
      enabled = false
    }

    profile.goroutine {
      enabled = false
    }

    profile.block {
      enabled = false
    }

    profile.mutex {
      enabled = false
    }

    profile.fgprof {
      enabled = false
    }
  }
}

Kubernetesのマニフェスト

如果Grafana代理可以连接到目标应用程序的pprof端点，就可以收集信息，因此我们选择使用Deployment。

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: grafana-agent-pprof
  name: pyroscope-pprof-grafana-agent
  namespace: observability
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: grafana-agent-pprof
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: grafana-agent-pprof
    spec:
      automountServiceAccountToken: true
      containers:
      - args:
        - run
        - /etc/agent/config.river
        - --storage.path=/tmp/agent
        - --server.http.listen-addr=0.0.0.0:80
        env:
        - name: AGENT_MODE
          value: flow
        image: docker.io/grafana/agent:v0.38.0
        imagePullPolicy: IfNotPresent
        readinessProbe:
          httpGet:
            path: /-/ready
            port: 80
          initialDelaySeconds: 10
          timeoutSeconds: 1
        name: grafana-agent
        ports:
        - containerPort: 80
          name: http-metrics
        volumeMounts:
        - mountPath: /etc/agent
          name: config
      serviceAccountName: pyroscope-pprof-grafana-agent
      volumes:
      - configMap:
          name: pyroscope-pprof-grafana-agent
        name: config

Grafana PyroscopeでのCPUプロファイルの確認

$Single___process_cpu_cpu_nanoseconds_cpu_nanoseconds{service_name__observability_pyroscope_}___Pyroscope_と___work_daily_2023.png$

Grafana Pyroscopeは独自のWebUIを持っているので、Grafana Loki等とは異なり、Grafanaとかを経由して参照しなくてもこんな形で情報を参照することができます。

只会给出一个选择，因此，请参阅以下中文释义：

指定Pod或时间来查看Flame Graph中的信息。一旦设置了Grafana Agent和Grafana Pyroscope，就可以回溯到pprof的信息，这对于开发环境非常方便。

因为Grafana Pyroscope目前无法查看像pprof一样以常规树状图形式呈现的图表，所以有些不太方便，但如果有这个功能会很有用。

感受

既にGoのアプリケーションでpprofを使っている場合は、既存のアプリケーションに対してコードを追加することなく情報を収集して蓄積できるので開発環境とかでちょっと試してみるのにはすごくいいツールだと思います。

由于爬虫设置方法很独特，对于初次接触的人来说非常难理解，所以在实际运行过程中，当团队成员更替时，接续这些设置内容可能会相当困难。

即使在Prometheus中，我们也经常遇到类似的运维问题，但特别是对于Grafana Agent来说，信息更加有限，很多东西并没有在官方文档中提到，所以对于原本没有接触过Prometheus的人来说，现在编写配置文件变得非常困难。

因为个人非常推荐这个工具，所以请大家试一试。