从Workbooks学习如何使用Azure Monitor (Kusto)监视AKS
引言
Azure Monitor for Containers 专为容器而设计,并提供了以下四个工作簿。
-
- ディスク容量
-
- ディスクIO
-
- Kubelet
- ネットワーク
在这篇文章中,我们希望通过参考这些已经准备好的工作簿来学习和理解Kusto查询。这些默认的工作簿可以在任何群集中随时使用,因此像备忘录一样使用非常方便。并不需要完全记住下面介绍的Kusto查询的每个细节。
Azure Monitor Workbooks是什么意思?
能够将多个日志查询结果和图表整合成报告形式的工具。创建的报告可以在团队内共享,这样即使新成员不知道如何查询日志,也可以传达监视所需的知识,比如:“如果查看这个遥测数据,就可以了解○○。”
Azure Monitor 基本概念的了解是必不可少的。
查询语法 基本写法
-
- Log Analytics チュートリアル
- クエリのチュートリアル
索引
-
- 基本的なクエリ
-
- スキーマの概要
-
- フィルター処理 e.g. | where hogehoge == “hugahuga”
並び替え e.g. | sort
グループ化と集計 e.g. | summarize
グラフ
クエリの保存と読み込み
列の選択と計算 e.g. | project column1, column2, column3
追加の列を定義 e.g. | extend NewColumn1=substring(OriginalColumn1, 0, 5)
時間列でグループ化(ビン分割) e.g. | summarize avg(CounterValue) by bin(TimeGenerated, 1h)
关于收集数据
ディスク容量、ディスクIO、ネットワークなどのテレメトリは、InfluxData Telegraf エージェントによって収集され、これらの情報はInsightMetricsというカスタムメトリックスとしてクエリできます。
在此 InsightMetrics 日志的 Tags 属性中,每个值都被放置,并通过使用 Name 和 NameSpace 进行过滤以获取每个度量指标。关于此问题,在 GitHub 中有相关信息,并引用如下。每个引用都有指向 InfluxData Telegraf 文档的链接。
- Disk metrics
- Disk IO metrics
reads
container.azm.ms/diskio
more inforead_bytes
container.azm.ms/diskio
more inforead_time
container.azm.ms/diskio
more infowrites
container.azm.ms/diskio
more infowrite_bytes
container.azm.ms/diskio
more infowrite_time
container.azm.ms/diskio
more infoio_time
container.azm.ms/diskio
more infoiops_in_progress
container.azm.ms/diskio
more info- Host network metrics
- Kubelet metrics
kubelet_docker_operations
container.azm.ms/prometheus
Cumulative number of Docker operations by operation typekubelet_docker_operations_errors
container.azm.ms/prometheus
Cumulative number of Docker operation errors by operation type参考来自于 https://github.com/microsoft/OMS-docker/blob/vishwa/june19agentrel/docs/InsightsMetrics.md
其他容器记录的详细信息可在以下文档中查看。
https://docs.microsoft.com/ja-jp/azure/azure-monitor/insights/container-insights-log-search
-
- ホストとコンテナーのパフォーマンス: Perf
コンテナー インベントリ: ContainerInventory
コンテナー ログ: ContainerLog
コンテナー ノード インベントリ: ContainerNodeInventory
Kubernetes クラスター内のポッドのインベントリ: KubePodInventory
Kubernetes クラスター内のノード部分のインベントリ: KubeNodeInventory
Kubernetes イベント: KubeEvents
Kubernetes クラスター内のサービス: KubeServices
Kubernetes クラスターのノード部分のパフォーマンス メトリック: Perf | where ObjectName == “K8SNode”
Kubernetes クラスターのコンテナー部分のパフォーマンス メトリック: Perf | where ObjectName == “K8SContainer”
カスタム メトリック: InsightsMetrics
Prometheus 支持
在Azure Monitor for containers中,您可以收集Prometheus指标而无需使用Prometheus服务器。然而,遗憾的是,在Workbook中没有Prometheus指标的查询选项,因此在此我们将只介绍设置和查询文档的方法。
只需在 ConfigMap 中配置收集指标的设置。
查询方法
执行 Prometheus 指标数据的查询
如果在 InsightsMetrics 的命名空间中过滤 prometheus,则 Tags 属性中也包含以 JSON 格式存储的指标。
InsightsMetrics
| where Namespace == "prometheus"
| extend tags=parse_json(Tags)
| summarize count() by Name
打开工作簿
那么,让我们逐个查看每个Workbook。
磁盘容量工作簿
以下的6行是用于显示图形总值的咒语,可以跳过阅读。这适用于全部磁盘容量的图形。
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
请注意重点
-
- where Origin == ‘container.azm.ms/telegraf’
- ディスク容量関連は、where Namespace == ‘disk’ or Namespace =~ ‘container.azm.ms/disk’
前三个磁盘的使用磁盘百分比最高。
let selectedStateDisks = dynamic(["*"]);
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let mostUsedDisks = data
| top-nested 3 of NodeDisk by MaxVal = max(Val);
data
| where NodeDisk in (mostUsedDisks)
| make-series ['Used Disk %'] = max(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
磁盘容量概述
已使用磁盘%
let selectedStateDisks = dynamic(["*"]);
let usedPercent = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let row = dynamic(
{
"Kind":"Unselected"});
let worstDiskAcrossNodes = usedPercent
| summarize UsedPercent = max(Val) by NodeDisk
| top 1 by UsedPercent desc;
usedPercent
| where (row.Kind == 'Unselected') or (row.Kind == 'Node' and row.Id == HostName) or (row.Kind == 'Device' and row.Id == NodeDisk)
| make-series ['Used Disk %'] = max(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
| where NodeDisk contains iff(row.Kind == 'Unselected', toscalar(worstDiskAcrossNodes
| project NodeDisk), '')
可用磁盘空间(GiB)
let selectedStateDisks = dynamic(["*"]);
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent' or Name == 'free'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let usedPercent = data
| where Name == 'used_percent';
let free = data
| where Name == 'free'
| extend Val = Val / 1073741824;
let row = dynamic(
{
"Kind":"Unselected"});
let worstDiskAcrossNodes = usedPercent
| summarize UsedPercent = max(Val) by NodeDisk
| top 1 by UsedPercent desc;
free
| where (row.Kind == 'Unselected') or (row.Kind == 'Node' and row.Id == HostName) or (row.Kind == 'Device' and row.Id == NodeDisk)
| make-series ['Free Disk Space'] = min(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
| where NodeDisk contains iff(row.Kind == 'Unselected', toscalar(worstDiskAcrossNodes
| project NodeDisk), '')
磁盘输入输出工作簿
以下的七行只是为了显示图表的总值而存在的咒语,可以跳过阅读。它适用于磁盘IO的所有图表。
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
请注意的要点
-
- where Origin == ‘container.azm.ms/telegraf’
- ディスクIO関連は、where Namespace == ‘container.azm.ms/diskio’
磁盘IO概述
每秒读取字节
let bytesReadPerSec = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), iif(PrevVal == Val, 0.0, (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesReadPerSec
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device
写入的字节/秒
let bytesWritePerSec = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), iif(PrevVal == Val, 0.0, (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesWritePerSec
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device
总读取字节数(每10分钟间隔)
let bytesReadTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, iif(PrevVal == Val, 0.0, (Val - PrevVal) / 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let sum = bytesReadTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device;
sum
总字节写入量(每10分钟)
let bytesWrittenTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, iif(PrevVal == Val, 0.0, (Val - PrevVal) / 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let sum = bytesWrittenTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device;
sum
每读取一个字节的毫秒数
let msPerByteRead = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, pow(Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1), pow((Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
msPerByteRead
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device
每字节写入的毫秒数
let msPerByteWritten = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(TimeGenerated == PrevTimeGenerated or (Val - PrevVal) == 0, 0.0, iif(PrevVal > Val, pow(Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1), pow((Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
msPerByteWritten
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device
进行中的IOPS
let iops = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'iops_in_progress'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| project TimeGenerated, HostName, Device, Val;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
iops
| make-series Val = iif(avgOn != -1, avg(Val), iif(maxOn != -1, max(Val), min(Val))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device
硬盘繁忙
let ioTime = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'io_time'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000)) * 100
| where isnotnull(Rate)
| project TimeGenerated, NodeDisk, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
ioTime
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
| extend Name = NodeDisk
| project-away NodeDisk
Kubelet 工作手册
注意点
-
- where Origin == ‘container.azm.ms/telegraf’
- Kubelet 関連は、where Namespace == ‘container.azm.ms/prometheus’
按节点的概述
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/prometheus'
| where Name == 'kubelet_docker_operations' or Name == 'kubelet_docker_operations_errors'
| extend Tags = todynamic(Tags)
| extend OperationType = tostring(Tags['operation_type']), HostName = tostring(Tags.hostName)
| where '*' in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2') or HostName in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2')
| where '*' in ('*') or OperationType in ('*')
| extend partitionKey = strcat(HostName, '/' , Name, '/', OperationType)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, Name, HostName, Rate;
let operationData = data
| where Name == 'kubelet_docker_operations';
let totalOperationsByNode = operationData
| summarize Rate = sum(Rate) by HostName
| project HostName, TotalOperations = Rate;
let totalOperationsByNodeSeries = operationData
| make-series TotalOperationsSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| project-away TimeGenerated;
let errorData = data
| where Name == 'kubelet_docker_operations_errors';
let totalErrorsByNode = errorData
| summarize Rate = sum(Rate) by HostName
| project HostName, TotalErrors = Rate;
let totalErrorsByNodeSeries = errorData
| make-series TotalErrorsSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| project-away TimeGenerated;
totalOperationsByNode
| join kind=inner
(
totalErrorsByNode
)
on HostName
| join kind = inner
(
totalOperationsByNodeSeries
)
on HostName
| join kind = inner
(
totalErrorsByNodeSeries
)
on HostName
| project-away HostName1, HostName2, HostName3
| extend TotalSuccessfulOperationsSeries = series_subtract(TotalOperationsSeries, TotalErrorsSeries)
| extend SuccessPercentage = round(iif(TotalOperations == 0, 1.0, 1 - (TotalErrors / TotalOperations)), 4), SuccessPercentageSeries = series_divide(TotalSuccessfulOperationsSeries, TotalOperationsSeries)
| extend SeriesOfEqualLength = range(1, array_length(TotalOperationsSeries), 1)
| extend SeriesOfOneHundo = series_multiply(series_divide(SeriesOfEqualLength, SeriesOfEqualLength), 100)
| extend SuccessfulOperationsEqualsTotalOperationsSeries = series_equals(TotalSuccessfulOperationsSeries, TotalOperationsSeries)
| extend SuccessPercentageSeries = array_iff(SuccessfulOperationsEqualsTotalOperationsSeries, SeriesOfOneHundo, SuccessPercentageSeries)
| project HostName, TotalOperations, TotalErrors, SuccessPercentage, SuccessPercentageSeries
| order by SuccessPercentage asc, HostName asc
| project-rename Node = HostName, ['Total Operations'] = TotalOperations, ['Total Errors'] = TotalErrors, ['Success %'] = SuccessPercentage, ['Success % Trend'] = SuccessPercentageSeries
按操作类型概述
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/prometheus'
| where Name == 'kubelet_docker_operations' or Name == 'kubelet_docker_operations_errors'
| extend Tags = todynamic(Tags)
| extend OperationType = tostring(Tags['operation_type']), HostName = tostring(Tags.hostName)
| where '*' in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2') or HostName in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2')
| where '*' in ('*') or OperationType in ('*')
| extend partitionKey = strcat(HostName, '/' , Name, '/', OperationType)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, Name, OperationType, Rate;
let operationData = data
| where Name == 'kubelet_docker_operations';
let totalOperationsByType = operationData
| summarize Rate = sum(Rate) by OperationType
| project OperationType, TotalOperations = Rate;
let totalOperationsByTypeSeries = operationData
| make-series TotalOperationsByTypeSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by OperationType
| project-away TimeGenerated;
let errorsData = data
| where Name == 'kubelet_docker_operations_errors';
let totalErrorsByType = errorsData
| summarize Rate = sum(Rate) by OperationType
| project OperationType, TotalErrors = Rate;
let totalErrorsByTypeSeries = errorsData
| make-series TotalErrorsByTypeSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by OperationType
| project-away TimeGenerated;
let seriesLength = toscalar( totalErrorsByTypeSeries
| extend ArrayLength = array_length(TotalErrorsByTypeSeries)
| summarize Array_Length = max(ArrayLength) );
totalOperationsByType
| join kind=leftouter
(
totalErrorsByType
)
on OperationType
| project-away OperationType1
| extend TotalErrors = iif(isempty(TotalErrors), 0.0, TotalErrors)
| join kind=leftouter
(
totalErrorsByTypeSeries
)
on OperationType
| project-away OperationType1
| extend SeriesOfEqualLength = range(1, seriesLength, 1)
| extend SeriesOfZeroes = series_subtract(SeriesOfEqualLength, SeriesOfEqualLength)
| extend SeriesOfOneHundo = series_multiply(series_divide(SeriesOfEqualLength, SeriesOfEqualLength), 100)
| extend TotalErrorsByTypeSeries = iif(isempty(TotalErrorsByTypeSeries), SeriesOfZeroes, TotalErrorsByTypeSeries)
| join kind=leftouter
(
totalOperationsByTypeSeries
)
on OperationType
| project-away OperationType1
| extend TotalSuccessfulOperationsByTypeSeries = series_subtract(TotalOperationsByTypeSeries, TotalErrorsByTypeSeries)
| extend SuccessPercentage = round(iif(TotalOperations == 0, 1.0, 1 - (TotalErrors / TotalOperations)), 4), SuccessPercentageSeries = series_divide(TotalSuccessfulOperationsByTypeSeries, TotalOperationsByTypeSeries)
| extend SuccessfulOperationsEqualsTotalOperationsSeries = series_equals(TotalSuccessfulOperationsByTypeSeries, TotalOperationsByTypeSeries)
| extend SuccessPercentageSeries = array_iff(SuccessfulOperationsEqualsTotalOperationsSeries, SeriesOfOneHundo, SuccessPercentageSeries)
| project OperationType, TotalOperations, TotalErrors, SuccessPercentage, SuccessPercentageSeries
| order by SuccessPercentage asc, OperationType asc
| project-rename ['Operation Type'] = OperationType, ['Total Operations'] = TotalOperations, ['Total Errors'] = TotalErrors, ['Success %'] = SuccessPercentage, ['Success % Trend'] = SuccessPercentageSeries
网络工作手册
以下的7行是用于图形总值显示的咒语,可以略过阅读。这适用于整个网络中的所有图形,并且是共通的。
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
注意点。
-
- where Origin == ‘container.azm.ms/telegraf’
- ネットワーク関連は、where Namespace == ‘container.azm.ms/net’
网络概述
由于有很多查询,所以省略。
每秒传输的字节数
let bytesSentPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_sent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesSentPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface
每秒收到的字节
let bytesReceivedPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_recv'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesReceivedPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface
每隔10分钟发送的总字节数
let bytesSentTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_sent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, (Val - PrevVal) / 1)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
bytesSentTotal
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface
按10分钟间隔统计的总接收字节数。
let bytesReceivedTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_recv'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, (Val - PrevVal) / 1)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Rate;
let sum = bytesReceivedTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| extend Name = strcat(HostName, ':', 'Sum')
| project-away HostName;
sum
每秒的错误数
由于截图时未找到任何结果,因此省略了屏幕显示。
let errorsOutPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_out'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / datetime_diff('Second', TimeGenerated, PrevTimeGenerated), (Val - PrevVal) / datetime_diff('Second', TimeGenerated, PrevTimeGenerated))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
errorsOutPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface
每秒错误数
由于捕捉图像时没有找到任何结果,因此省略了屏幕截图。
let errorsInPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_in'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / datetime_diff('Second', TimeGenerated, PrevTimeGenerated), (Val - PrevVal) / datetime_diff('Second', TimeGenerated, PrevTimeGenerated))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
errorsInPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface
总错误数(按照每10分钟为间隔)
由於抓取屏幕時结果为0个数据,所以省略了屏幕截图。
let totalErrorsOut = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_out'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
totalErrorsOut
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface
总错误数(按10m间隔)
let totalErrorsIn = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_in'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
totalErrorsIn
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface
根据参考资料的概要归纳
-
- Docs: Azure Monitor for containers を有効にする方法
Docs: Azure Monitor で Log Analytics の使用を開始する
Docs: Azure Monitor でログ クエリの使用を開始する
Docs: Azure Monitor for containers からログを照会する方法
Docs: Linux VM のカスタム メトリックを InfluxData Telegraf エージェントを使用して収集する
Docs: Azure Monitor ブックを使用した対話型レポートの作成
Qiita: Azure Monitor for containersが晴れてGAしました!