我来试试亚马逊托管的Prometheus服务

1 年 ago

新, 韵

10 minutes

(2021年9月30日追記)
Amazon Managed Service for Prometheus 进入了一般可用状态，这是个好消息！

在预览版本中没有提供的警报管理器相关功能现在可用，并且现在有9个可用地区（包括东京）！

（补充此处之前）

大家AWS re:Invent玩得开心吗！12月18日之前一直会持续进行，还没参加的人可以去看看！

我在今年夏天将EKS迁移时，辛苦地在Kubernetes上引入了Airflow、Grafana和Prometheus，结果却在短时间内全部被宣布成了托管服务。然而，由于我选择了自己搭建和运维这些服务，而不是使用AWS托管的服务，所以我能够学到关于每个服务的概念和经验，所以我并不后悔。

关于在 EKS 上管理的 Prometheus，嗯，现在来谈谈吧。

インターネット経由して一定のセキュリティを担保した上でメトリクスを格納 & クエリしたい

我刚好有类似的困扰和需求，正思考着怎么解决呢。正好今天 Amazon 推出了 Prometheus 的托管服务——Amazon Managed Service for Prometheus，看起来可以解决所有问题，于是我试用了一下并分享给大家。

如果你对于 EKS 迁移事项感兴趣的话，可以参考下面的笔记。虽然话题稍有偏离，但请随意！

关于Amazon托管的Prometheus服务

请不要详细解释Prometheus，请查阅其他文章。

Amazon Managed Service for Prometheus 是一个可以存储和查询兼容Prometheus的指标数据的服务。
您可以使用Prometheus Exporter等工具将指标存储在Amazon Managed Service for Prometheus中，并通过Grafana查询Amazon Managed Service for Prometheus中的指标数据，并以图形的形式进行确认。
虽然它被称为Prometheus的托管服务，但它更像是一个用于存储由Prometheus收集的指标数据的仓库。
从根本上讲，Prometheus并不保存指标数据。
如果您想要永久保存指标数据，那么需要使用其他服务，如Thanos或Cortex来存储指标数据（我们公司采用的是Thanos，而Amazon Managed Service for Prometheus似乎使用的是Cortex）。
因此，即使使用了Amazon Managed Service for Prometheus，您仍需要自行部署Exporter等工具来从容器中抓取指标，并且需要自行准备Grafana等工具来查询和展示存储的数据。
至于Grafana，我们也有Amazon Managed Service for Grafana！（正在积极申请预览）

应由谁来使用？

那么，Amazon Managed Service for Grafana 的目标用户和受益者是谁呢？
AWS真厉害，详细记录在FAQ中。

Q: 为什么我应该使用亚马逊Prometheus托管服务？
如果您已经采用了基于开源的监控策略，已经部署了或计划采用Prometheus进行容器监控，并且更喜欢由AWS提供增强的安全性、可扩展性和可用性的全托管体验，那么您应该使用AMP。

Q: 亚马逊Prometheus托管服务与亚马逊CloudWatch有何关系？我应该使用哪一个？
如果您正在寻找一个综合的监测服务，能够统一AWS服务、EC2、容器和无服务器的日志、指标、追踪、仪表板和警报，您应该使用亚马逊CloudWatch。
如果您正在运行容器，并且希望使用一个与Prometheus开源项目完全兼容的服务，您应该选择AMP。如果您已经在运行Prometheus，并且希望消除持续的运维成本同时提升安全性，您也应该选择AMP。

以下是从https://aws.amazon.com/jp/grafana/faqs/上提取的信息:

目标受众似乎是那些已经使用Prometheus构建监控系统，并对运营成本感到疲倦的人。通过使用Amazon Managed Service for Prometheus，您将不再需要担心管理收集到的度量数据等事项。

主要功能

我将在下面的文件中记录所读到的内容。

安全性

在 Amazon Managed Service for Prometheus 中，存储和获取指标数据需要使用存储了 AWS IAM 认证信息的 AWS Signature Version 4 的签名。因此，您可以在 AWS IAM 的安全功能之上使用 Prometheus。为了进一步满足不希望通过互联网交换指标数据的用例，您可以使用 VPC Endpoint（Interface 型）。

可利用性

根据文档显示，有关度量数据的存储采用了Multi-AZ的方式，并进行了3个可用区的部署。数据首先存储在EBS上，然后保存在S3上。

資料保存期限

目前只能选择150天的保管期限。

警告经理

从2021年9月30日开始，您可以使用警报管理器功能。此外，您还可以选择将SNS作为通知目标，通过SNS→Lambda实现灵活的通知执行。

区域 (qū yù)

以下的9个地区可用。

Asia Pacific (Tokyo).

我用EKS体验了一下Amazon Prometheus托管服务。

如果你参考了以下的博客，请不要阅读下面的内容。但是，如果你对能否跨区域使用感兴趣或者想用日语阅读，请继续阅读。

我正在跨越以下地区。

リソースリージョンAmazon Managed Service for Prometheusus-east-1EKSap-northeast-1

此外，在下面的操作中，需要使用AWS CLI、eksctl、kubectl和helm，因此需要具备这些工具的可用性，并且需要有相应的认证信息的shell环境。

创建亚马逊托管的Prometheus服务

请登录AWS管理控制台，然后前往Amazon Managed Service for Prometheus的页面（https://console.aws.amazon.com/prometheus/home）。
尽管Amazon Managed Service for Prometheus处于预览状态，但无需申请即可预览。
按照上面的图片一样，快速创建一个工作区。

然后，几秒钟后，状态将变为ACTIVE，资源将被创建。
一旦资源被创建，将会生成两种HTTP地址（Endpoint-远程写入URL，Endpoint-查询URL），分别用于存储和查询指标数据。
此外，这些URL似乎是可以公开访问的。

$ dig aps-workspaces.us-east-1.amazonaws.com +short
3.223.166.82
100.24.159.220
52.70.212.28
54.210.4.8
35.169.187.85
54.243.211.98

我创建了一个 IAM 角色和提供者。

下一步是在Amazon Managed Service for Prometheus中存储度量数据，并创建用于获取已存储度量数据的IAM资源。
执行以下博客中提供的bash脚本。
需要更改的是第一行的YOUR_EKS_CLUSTER_NAME和最后一行的eksctl，因为此次的EKS位于ap-northeast-1区域，需要传递–region参数给eksctl。

执行上述脚本后将会出现如下情况，尽管似乎会输出错误信息，但其实没有问题。

✦ ❯ bash init.sh 
Creating a new trust policy

An error occurred (NoSuchEntity) when calling the GetRole operation: The role with name EKS-AMP-ServiceAccount-Role cannot be found.
Appending to the existing trust policy

An error occurred (NoSuchEntity) when calling the GetPolicy operation: Policy arn:aws:iam::XXXXXXXXXX:policy/AWSManagedPrometheusWriteAccessPolicy was not found.
Creating a new permission policy AWSManagedPrometheusWriteAccessPolicy
{
    "Policy": {
        "PolicyName": "AWSManagedPrometheusWriteAccessPolicy",
        "PolicyId": "ANPARVSLQ63UWTO7GOCOY",
        "Arn": "arn:aws:iam::XXXXXXXXXX:policy/AWSManagedPrometheusWriteAccessPolicy",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "PermissionsBoundaryUsageCount": 0,
        "IsAttachable": true,
        "CreateDate": "2020-12-17T12:50:08+00:00",
        "UpdateDate": "2020-12-17T12:50:08+00:00"
    }
}

An error occurred (NoSuchEntity) when calling the GetRole operation: The role with name EKS-AMP-ServiceAccount-Role cannot be found.
EKS-AMP-ServiceAccount-Role role does not exist. Creating a new role with a trust and permission policy
arn:aws:iam::XXXXXXXXXX:role/EKS-AMP-ServiceAccount-Role
[ℹ]  eksctl version 0.30.0
[ℹ]  using region ap-northeast-1
[ℹ]  will create IAM Open ID Connect provider for cluster "eks-cluster" in "ap-northeast-1"
[✔]  created IAM Open ID Connect provider for cluster "eks-cluster" in "ap-northeast-1"

安装Prometheus。

为了使用Amazon Managed Service for Prometheus，需要输入一些适当的数据。
因此，将Prometheus安装在EKS上。
只需要使用Helm执行命令即可。

$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories

$ kubectl create ns prometheus
namespace/prometheus created

$ helm install prometheus-for-amp prometheus-community/prometheus -n prometheus
NAME: prometheus-for-amp
LAST DEPLOYED: Thu Dec 17 12:56:30 2020
NAMESPACE: prometheus
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-for-amp-server.prometheus.svc.cluster.local


Get the Prometheus server URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace prometheus -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace prometheus port-forward $POD_NAME 9090


The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-for-amp-alertmanager.prometheus.svc.cluster.local


Get the Alertmanager URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace prometheus -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace prometheus port-forward $POD_NAME 9093
#################################################################################
######   WARNING: Pod Security Policy has been moved to a global property.  #####
######            use .Values.podSecurityPolicy.enabled with pod-based      #####
######            annotations                                               #####
######            (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #####
#################################################################################


The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
prometheus-for-amp-pushgateway.prometheus.svc.cluster.local


Get the PushGateway URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace prometheus -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace prometheus port-forward $POD_NAME 9091

For more information on running Prometheus, visit:
https://prometheus.io/

可以看到，已在prometheus的命名空间中创建了与Prometheus相关的POD。

$ k get po -n prometheus
NAME                                                    READY   STATUS    RESTARTS   AGE
prometheus-for-amp-alertmanager-5cb9f4478c-km4ht        2/2     Running   0          6m26s
prometheus-for-amp-kube-state-metrics-bc9cb958f-l7p7k   1/1     Running   0          6m26s
prometheus-for-amp-node-exporter-69qsw                  1/1     Running   0          6m27s
prometheus-for-amp-node-exporter-bg4ss                  1/1     Running   0          6m26s
prometheus-for-amp-node-exporter-fs74x                  1/1     Running   0          6m26s
prometheus-for-amp-pushgateway-56ff9d9d99-4z2sf         1/1     Running   0          6m26s
prometheus-for-amp-server-7f6d6fcf59-kpl5m              2/2     Running   0          6m26s

部署 AWS 签名代理

(2021-04-16 更新)
从 Prometheus 2.26.0 开始，原生支持 AWS Signature Version 4 认证。
如果使用 Prometheus 2.26.0，则无需使用 AWS signing proxy 容器。

使用 Prometheus 2.26.0 的步骤已在下列网址中记录：
https://aws.amazon.com/jp/blogs/mt/getting-started-amazon-managed-service-for-prometheus/

（2021-04-16更新）请提供中文原文，我将翻译为等效的中文版本。

如前所述，要将指标数据投入和提取出Amazon Managed Service for Prometheus，需要使用AWS Signature Version 4签名。
之前的Helm模板不是为Amazon Managed Service for Prometheus设计的，没有提供包含AWS Signature Version 4签名的机制。
AWS签名代理（AWS signing proxy）可以实现这一点，它作为Prometheus的sidecar容器进行部署。

可以复制博客中提供的YAML并根据自己的环境进行覆盖，但亚马逊托管的Prometheus服务的管理控制台也提供了相同的选项。

只需将管理控制台的YAML中的annotations字段的eks.amazonaws.com/role-arn的值替换为”IAM角色、提供者创建”时所创建的IAM角色的ARN (arn:aws:iam:::role/EKS-AMP-ServiceAccount-Role)，即可。

当准备好YAML文件后，将其作为Helm模板的值传递，重新部署Prometheus。

$ helm upgrade --install prometheus-for-amp prometheus-community/prometheus -n prometheus -f ./amp_ingest_override_values.yaml

Release "prometheus-for-amp" has been upgraded. Happy Helming!
NAME: prometheus-for-amp
LAST DEPLOYED: Thu Dec 17 13:03:23 2020
NAMESPACE: prometheus
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-for-amp-server.prometheus.svc.cluster.local


Get the Prometheus server URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace prometheus -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace prometheus port-forward $POD_NAME 9090


The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-for-amp-alertmanager.prometheus.svc.cluster.local


Get the Alertmanager URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace prometheus -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace prometheus port-forward $POD_NAME 9093
#################################################################################
######   WARNING: Pod Security Policy has been moved to a global property.  #####
######            use .Values.podSecurityPolicy.enabled with pod-based      #####
######            annotations                                               #####
######            (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #####
#################################################################################


The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
prometheus-for-amp-pushgateway.prometheus.svc.cluster.local


Get the PushGateway URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace prometheus -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace prometheus port-forward $POD_NAME 9091

For more information on running Prometheus, visit:
https://prometheus.io/

查看已部署的内容后，只有Prometheus服务器进行了重新部署。

$ k get po -n prometheus
NAME                                                    READY   STATUS              RESTARTS   AGE
prometheus-for-amp-alertmanager-5cb9f4478c-km4ht        2/2     Running             0          7m10s
prometheus-for-amp-kube-state-metrics-bc9cb958f-l7p7k   1/1     Running             0          7m10s
prometheus-for-amp-node-exporter-69qsw                  1/1     Running             0          7m11s
prometheus-for-amp-node-exporter-bg4ss                  1/1     Running             0          7m10s
prometheus-for-amp-node-exporter-fs74x                  1/1     Running             0          7m10s
prometheus-for-amp-pushgateway-56ff9d9d99-4z2sf         1/1     Running             0          7m10s
prometheus-for-amp-server-0                             0/3     ContainerCreating   0          17s

当仔细确认后，我们可以看到附带了边车容器。

$ k describe po prometheus-for-amp-server-0  -n prometheus
Name:         prometheus-for-amp-server-0

...

Containers:
  prometheus-server-configmap-reload:
    Container ID:  docker://51f6c62b778acb9b6c895b7dc6f14efe54f9747ed07936206be487fe4227a2ef
    Image:         jimmidyson/configmap-reload:v0.4.0

...

  prometheus-server:
    Container ID:  docker://35840c35fcad0428b5568f469cd2e28209cda1eab52dc135bee1a1d8693546d7
    Image:         quay.io/prometheus/prometheus:v2.22.1

...

  aws-sigv4-proxy-sidecar:
    Container ID:  docker://6897c52c5114bee6886e254c8c526c80ce2f94fcd807ae202f4b5491bbbc6bdb
    Image:         public.ecr.aws/aws-observability/aws-sigv4-proxy:1.0

...


Events:
  Type    Reason                  Age   From                                                    Message
  ----    ------                  ----  ----                                                    -------
  Normal  Scheduled               50s   default-scheduler                                       Successfully assigned prometheus/prometheus-for-amp-server-0 to ip-10-0-44-92.ap-northeast-1.compute.internal
  Normal  SuccessfulAttachVolume  47s   attachdetach-controller                                 AttachVolume.Attach succeeded for volume "pvc-c1a0ad74-fe75-4646-86d8-61c51e320f87"
  Normal  Pulling                 40s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Pulling image "jimmidyson/configmap-reload:v0.4.0"
  Normal  Pulled                  35s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Successfully pulled image "jimmidyson/configmap-reload:v0.4.0"
  Normal  Created                 35s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Created container prometheus-server-configmap-reload
  Normal  Started                 35s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Started container prometheus-server-configmap-reload
  Normal  Pulling                 35s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Pulling image "quay.io/prometheus/prometheus:v2.22.1"
  Normal  Pulled                  24s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Successfully pulled image "quay.io/prometheus/prometheus:v2.22.1"
  Normal  Created                 23s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Created container prometheus-server
  Normal  Started                 23s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Started container prometheus-server
  Normal  Pulling                 23s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Pulling image "public.ecr.aws/aws-observability/aws-sigv4-proxy:1.0"
  Normal  Pulled                  19s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Successfully pulled image "public.ecr.aws/aws-observability/aws-sigv4-proxy:1.0"
  Normal  Created                 19s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Created container aws-sigv4-proxy-sidecar
  Normal  Started                 19s   kubelet, ip-10-0-44-92.ap-northeast-1.compute.internal  Started container aws-sigv4-proxy-sidecar

Grafana 的部署

那么，让我们来确认一下是否已经成功将指标数据投入到 Amazon Managed Service for Prometheus 中。
由于尚未经过 Amazon Managed Service for Grafana 的预览申请，我们将部署 Grafana。

$helm repo add grafana https://grafana.github.io/helm-charts
"grafana" has been added to your repositories

$kubectl create ns grafana
namespace/grafana created

$helm install grafana-for-amp grafana/grafana -n grafana
NAME: grafana-for-amp
LAST DEPLOYED: Thu Dec 17 13:08:08 2020
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:

   kubectl get secret --namespace grafana grafana-for-amp -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

   grafana-for-amp.grafana.svc.cluster.local

   Get the Grafana URL to visit by running these commands in the same shell:

     export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana-for-amp" -o jsonpath="{.items[0].metadata.name}")
     kubectl --namespace grafana port-forward $POD_NAME 3000

3. Login with the password from step 1 and the username: admin
#################################################################################
######   WARNING: Persistence is disabled!!! You will lose your data when   #####
######            the Grafana pod is terminated.                            #####
#################################################################################

与 Prometheus 的情况类似，部署 AWS signing 代理（请参考博客，因为 Amazon Managed Service for Prometheus 的管理控制台中没有所需的 YAML 文件）。

$ helm upgrade --install grafana-for-amp grafana/grafana -n grafana -f ./amp_query_override_values.yaml
Release "grafana-for-amp" has been upgraded. Happy Helming!
NAME: grafana-for-amp
LAST DEPLOYED: Thu Dec 17 13:08:47 2020
NAMESPACE: grafana
STATUS: deployed
REVISION: 2
NOTES:
1. Get your 'admin' user password by running:

   kubectl get secret --namespace grafana grafana-for-amp -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

   grafana-for-amp.grafana.svc.cluster.local

   Get the Grafana URL to visit by running these commands in the same shell:

     export POD_NAME=$(kubectl get pods --namespace grafana -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana-for-amp" -o jsonpath="{.items[0].metadata.name}")
     kubectl --namespace grafana port-forward $POD_NAME 3000

3. Login with the password from step 1 and the username: admin
#################################################################################
######   WARNING: Persistence is disabled!!! You will lose your data when   #####
######            the Grafana pod is terminated.                            #####
#################################################################################

完成部署后，首先确认Grafana POD的名称，然后使用kubectl port-forward命令在本地浏览器中打开。（我在Cloud9上执行，所以将其转发到8080端口。）

$ k get po -n grafana
NAME                               READY   STATUS    RESTARTS   AGE
grafana-for-amp-79d5454dbc-ghhp8   1/1     Running   0          3h42m
$ kubectl port-forward -n grafana pods/grafana-for-amp-79d5454dbc-ghhp8 8080:3000

如果进展顺利，登录界面将如上所示呈现。
密码可以通过下述指令进行确认。

$ kubectl get secrets grafana-for-amp -n grafana -o jsonpath='{.data.admin-password}'|base64 --decode

一旦登录Grafana后，将进行Prometheus的配置。
请在Grafana页面左侧点击齿轮图标，选择”数据源”，然后选择”Prometheus”。
然后，将会出现如上所示的画面。

SigV4 Auth Details > Default Region: us-east-1

设置完成后，您可以在Grafana页面的底部，像上面的图片一样找到一个名为“保存并测试”的按钮，单击它，如果出现“数据源正常工作”的提示，表示设置已完成。

我們將實際查詢指標數據。
在Grafana頁面的左側，點擊指南針圖標。
在”Metrics”右邊輸入要查詢的指標名稱，然後點擊右上角的按鈕，圖表將會顯示出來。

以后我想尝试的事情

我只是简单地浏览了一下，所以我想增加一些度量标准，并比较自己建立的 Prometheus 环境和查询速度，还想进行费用成本的比较。

此外，原文中指出了对于自有的Prometheus的要求，希望通过互联网确保一定的安全性并存储和查询指标。这是为了在金丝雀发布中使用Spinnaker的金丝雀分析工具Kayenta，并利用Prometheus的指标。因此，我希望在将来能够验证在使用kayenta时是否可以直接使用亚马逊托管的Prometheus，以及我正在使用Istio，因此是否可以同时使用Istio代理和AWS签名代理。