从Workbooks学习如何使用Azure Monitor (Kusto)监视AKS

引言

Azure Monitor for Containers 专为容器而设计,并提供了以下四个工作簿。

    • ディスク容量

 

    • ディスクIO

 

    • Kubelet

 

    ネットワーク

在这篇文章中,我们希望通过参考这些已经准备好的工作簿来学习和理解Kusto查询。这些默认的工作簿可以在任何群集中随时使用,因此像备忘录一样使用非常方便。并不需要完全记住下面介绍的Kusto查询的每个细节。

Azure Monitor Workbooks是什么意思?

能够将多个日志查询结果和图表整合成报告形式的工具。创建的报告可以在团队内共享,这样即使新成员不知道如何查询日志,也可以传达监视所需的知识,比如:“如果查看这个遥测数据,就可以了解○○。”

image.png

Azure Monitor 基本概念的了解是必不可少的。

查询语法 基本写法

    • Log Analytics チュートリアル

 

    クエリのチュートリアル

索引

    • 基本的なクエリ

 

    • スキーマの概要

 

    • フィルター処理 e.g. | where hogehoge == “hugahuga”

並び替え e.g. | sort

グループ化と集計 e.g. | summarize

グラフ
クエリの保存と読み込み
列の選択と計算 e.g. | project column1, column2, column3

追加の列を定義 e.g. | extend NewColumn1=substring(OriginalColumn1, 0, 5)

時間列でグループ化(ビン分割) e.g. | summarize avg(CounterValue) by bin(TimeGenerated, 1h)

关于收集数据

ディスク容量、ディスクIO、ネットワークなどのテレメトリは、InfluxData Telegraf エージェントによって収集され、これらの情報はInsightMetricsというカスタムメトリックスとしてクエリできます。

在此 InsightMetrics 日志的 Tags 属性中,每个值都被放置,并通过使用 Name 和 NameSpace 进行过滤以获取每个度量指标。关于此问题,在 GitHub 中有相关信息,并引用如下。每个引用都有指向 InfluxData Telegraf 文档的链接。

    Disk metrics
NameNamespaceDescriptionusedcontainer.azm.ms/diskmore infofreecontainer.azm.ms/diskmore infoused_percentcontainer.azm.ms/diskmore info
    Disk IO metrics
NameNamespaceDescriptionreadscontainer.azm.ms/diskiomore inforead_bytescontainer.azm.ms/diskiomore inforead_timecontainer.azm.ms/diskiomore infowritescontainer.azm.ms/diskiomore infowrite_bytescontainer.azm.ms/diskiomore infowrite_timecontainer.azm.ms/diskiomore infoio_timecontainer.azm.ms/diskiomore infoiops_in_progresscontainer.azm.ms/diskiomore info
    Host network metrics
NameNamespaceDescriptionbytes_sentcontainer.azm.ms/netmore infobytes_receivedcontainer.azm.ms/netmore infoerr_incontainer.azm.ms/netmore infoerr_outcontainer.azm.ms/netmore info
    Kubelet metrics
NameNamespaceDescriptionkubelet_docker_operationscontainer.azm.ms/prometheusCumulative number of Docker operations by operation typekubelet_docker_operations_errorscontainer.azm.ms/prometheusCumulative number of Docker operation errors by operation type

参考来自于 https://github.com/microsoft/OMS-docker/blob/vishwa/june19agentrel/docs/InsightsMetrics.md

其他容器记录的详细信息可在以下文档中查看。
https://docs.microsoft.com/ja-jp/azure/azure-monitor/insights/container-insights-log-search

    • ホストとコンテナーのパフォーマンス: Perf

コンテナー インベントリ: ContainerInventory

コンテナー ログ: ContainerLog

コンテナー ノード インベントリ: ContainerNodeInventory

Kubernetes クラスター内のポッドのインベントリ: KubePodInventory

Kubernetes クラスター内のノード部分のインベントリ: KubeNodeInventory

Kubernetes イベント: KubeEvents

Kubernetes クラスター内のサービス: KubeServices

Kubernetes クラスターのノード部分のパフォーマンス メトリック: Perf | where ObjectName == “K8SNode”

Kubernetes クラスターのコンテナー部分のパフォーマンス メトリック: Perf | where ObjectName == “K8SContainer”

カスタム メトリック: InsightsMetrics

Prometheus 支持

在Azure Monitor for containers中,您可以收集Prometheus指标而无需使用Prometheus服务器。然而,遗憾的是,在Workbook中没有Prometheus指标的查询选项,因此在此我们将只介绍设置和查询文档的方法。

只需在 ConfigMap 中配置收集指标的设置。

查询方法
执行 Prometheus 指标数据的查询
如果在 InsightsMetrics 的命名空间中过滤 prometheus,则 Tags 属性中也包含以 JSON 格式存储的指标。

InsightsMetrics 
| where Namespace == "prometheus"
| extend tags=parse_json(Tags)
| summarize count() by Name

打开工作簿

image.png

那么,让我们逐个查看每个Workbook。

磁盘容量工作簿

以下的6行是用于显示图形总值的咒语,可以跳过阅读。这适用于全部磁盘容量的图形。

| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);

请注意重点

    • where Origin == ‘container.azm.ms/telegraf’

 

    ディスク容量関連は、where Namespace == ‘disk’ or Namespace =~ ‘container.azm.ms/disk’

前三个磁盘的使用磁盘百分比最高。

image.png
let selectedStateDisks = dynamic(["*"]);
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let mostUsedDisks = data
| top-nested 3 of NodeDisk by MaxVal = max(Val);
data
| where NodeDisk in (mostUsedDisks)
| make-series ['Used Disk %'] = max(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk

磁盘容量概述

image.png

已使用磁盘%

image.png
let selectedStateDisks = dynamic(["*"]);
let usedPercent = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let row = dynamic(
{
    "Kind":"Unselected"});
    let worstDiskAcrossNodes = usedPercent
    | summarize UsedPercent = max(Val) by NodeDisk
    | top 1 by UsedPercent desc;
    usedPercent
    | where (row.Kind == 'Unselected') or (row.Kind == 'Node' and row.Id == HostName) or (row.Kind == 'Device' and row.Id == NodeDisk)
    | make-series ['Used Disk %'] = max(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
    | where NodeDisk contains iff(row.Kind == 'Unselected', toscalar(worstDiskAcrossNodes
    | project NodeDisk), '')

可用磁盘空间(GiB)

image.png
let selectedStateDisks = dynamic(["*"]);
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'disk' or Namespace =~ 'container.azm.ms/disk'
| where Name == 'used_percent' or Name == 'free'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.device))
| extend NodeDisk = strcat(HostName, Device)
| where "*" in ('*') or HostName in ('*')
| where "*" in ('*') or Device in ('*')
| where NodeDisk in (selectedStateDisks) or '*' in (selectedStateDisks);
let usedPercent = data
| where Name == 'used_percent';
let free = data
| where Name == 'free'
| extend Val = Val / 1073741824;
let row = dynamic(
{
    "Kind":"Unselected"});
    let worstDiskAcrossNodes = usedPercent
    | summarize UsedPercent = max(Val) by NodeDisk
    | top 1 by UsedPercent desc;
    free
    | where (row.Kind == 'Unselected') or (row.Kind == 'Node' and row.Id == HostName) or (row.Kind == 'Device' and row.Id == NodeDisk)
    | make-series ['Free Disk Space'] = min(Val) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
    | where NodeDisk contains iff(row.Kind == 'Unselected', toscalar(worstDiskAcrossNodes
    | project NodeDisk), '')

磁盘输入输出工作簿

以下的七行只是为了显示图表的总值而存在的咒语,可以跳过阅读。它适用于磁盘IO的所有图表。

| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize

请注意的要点

    • where Origin == ‘container.azm.ms/telegraf’

 

    ディスクIO関連は、where Namespace == ‘container.azm.ms/diskio’

磁盘IO概述

image.png

每秒读取字节

image.png
let bytesReadPerSec = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), iif(PrevVal == Val, 0.0, (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesReadPerSec
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

写入的字节/秒

image.png
let bytesWritePerSec = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), iif(PrevVal == Val, 0.0, (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesWritePerSec
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

总读取字节数(每10分钟间隔)

image.png
let bytesReadTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, iif(PrevVal == Val, 0.0, (Val - PrevVal) / 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let sum = bytesReadTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device;
sum

总字节写入量(每10分钟)

image.png
let bytesWrittenTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, iif(PrevVal == Val, 0.0, (Val - PrevVal) / 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let sum = bytesWrittenTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device;
sum

每读取一个字节的毫秒数

image.png
let msPerByteRead = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'read_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, pow(Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1), pow((Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
msPerByteRead
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

每字节写入的毫秒数

image.png
let msPerByteWritten = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'write_bytes'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(TimeGenerated == PrevTimeGenerated or (Val - PrevVal) == 0, 0.0, iif(PrevVal > Val, pow(Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1), pow((Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000 * 1), -1)))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Device, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
msPerByteWritten
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

进行中的IOPS

image.png
let iops = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'iops_in_progress'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| project TimeGenerated, HostName, Device, Val;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
iops
| make-series Val = iif(avgOn != -1, avg(Val), iif(maxOn != -1, max(Val), min(Val))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Device
| extend Name = strcat(HostName, Device)
| project-away HostName, Device

硬盘繁忙

image.png
let ioTime = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/diskio'
| where Name == 'io_time'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Device = strcat('/dev/', tostring(Tags.name))
| extend NodeDisk = strcat(HostName, Device)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Device in ('*')
| order by NodeDisk asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(NodeDisk) != NodeDisk, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(NodeDisk) != NodeDisk, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1000)) * 100
| where isnotnull(Rate)
| project TimeGenerated, NodeDisk, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
ioTime
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by NodeDisk
| extend Name = NodeDisk
| project-away NodeDisk

Kubelet 工作手册

注意点

    • where Origin == ‘container.azm.ms/telegraf’

 

    Kubelet 関連は、where Namespace == ‘container.azm.ms/prometheus’

按节点的概述

image.png
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/prometheus'
| where Name == 'kubelet_docker_operations' or Name == 'kubelet_docker_operations_errors'
| extend Tags = todynamic(Tags)
| extend OperationType = tostring(Tags['operation_type']), HostName = tostring(Tags.hostName)
| where '*' in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2') or HostName in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2')
| where '*' in ('*') or OperationType in ('*')
| extend partitionKey = strcat(HostName, '/' , Name, '/', OperationType)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, Name, HostName, Rate;
let operationData = data
| where Name == 'kubelet_docker_operations';
let totalOperationsByNode = operationData
| summarize Rate = sum(Rate) by HostName
| project HostName, TotalOperations = Rate;
let totalOperationsByNodeSeries = operationData
| make-series TotalOperationsSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| project-away TimeGenerated;
let errorData = data
| where Name == 'kubelet_docker_operations_errors';
let totalErrorsByNode = errorData
| summarize Rate = sum(Rate) by HostName
| project HostName, TotalErrors = Rate;
let totalErrorsByNodeSeries = errorData
| make-series TotalErrorsSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| project-away TimeGenerated;
totalOperationsByNode
| join kind=inner
(
    totalErrorsByNode
)
on HostName
| join kind = inner
(
    totalOperationsByNodeSeries
)
on HostName
| join kind = inner
(
    totalErrorsByNodeSeries
)
on HostName
| project-away HostName1, HostName2, HostName3
| extend TotalSuccessfulOperationsSeries = series_subtract(TotalOperationsSeries, TotalErrorsSeries)
| extend SuccessPercentage = round(iif(TotalOperations == 0, 1.0, 1 - (TotalErrors / TotalOperations)), 4), SuccessPercentageSeries = series_divide(TotalSuccessfulOperationsSeries, TotalOperationsSeries)
| extend SeriesOfEqualLength = range(1, array_length(TotalOperationsSeries), 1)
| extend SeriesOfOneHundo = series_multiply(series_divide(SeriesOfEqualLength, SeriesOfEqualLength), 100)
| extend SuccessfulOperationsEqualsTotalOperationsSeries = series_equals(TotalSuccessfulOperationsSeries, TotalOperationsSeries)
| extend SuccessPercentageSeries = array_iff(SuccessfulOperationsEqualsTotalOperationsSeries, SeriesOfOneHundo, SuccessPercentageSeries)
| project HostName, TotalOperations, TotalErrors, SuccessPercentage, SuccessPercentageSeries
| order by SuccessPercentage asc, HostName asc
| project-rename Node = HostName, ['Total Operations'] = TotalOperations, ['Total Errors'] = TotalErrors, ['Success %'] = SuccessPercentage, ['Success % Trend'] = SuccessPercentageSeries

按操作类型概述

image.png
let data = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/prometheus'
| where Name == 'kubelet_docker_operations' or Name == 'kubelet_docker_operations_errors'
| extend Tags = todynamic(Tags)
| extend OperationType = tostring(Tags['operation_type']), HostName = tostring(Tags.hostName)
| where '*' in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2') or HostName in ('aks-agentpool-14531005-0','aks-agentpool-14531005-1','aks-agentpool-14531005-2')
| where '*' in ('*') or OperationType in ('*')
| extend partitionKey = strcat(HostName, '/' , Name, '/', OperationType)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, Name, OperationType, Rate;
let operationData = data
| where Name == 'kubelet_docker_operations';
let totalOperationsByType = operationData
| summarize Rate = sum(Rate) by OperationType
| project OperationType, TotalOperations = Rate;
let totalOperationsByTypeSeries = operationData
| make-series TotalOperationsByTypeSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by OperationType
| project-away TimeGenerated;
let errorsData = data
| where Name == 'kubelet_docker_operations_errors';
let totalErrorsByType = errorsData
| summarize Rate = sum(Rate) by OperationType
| project OperationType, TotalErrors = Rate;
let totalErrorsByTypeSeries = errorsData
| make-series TotalErrorsByTypeSeries = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by OperationType
| project-away TimeGenerated;
let seriesLength = toscalar(   totalErrorsByTypeSeries
| extend ArrayLength = array_length(TotalErrorsByTypeSeries)
| summarize Array_Length = max(ArrayLength)  );
totalOperationsByType
| join kind=leftouter
(
    totalErrorsByType
)
on OperationType
| project-away OperationType1
| extend TotalErrors = iif(isempty(TotalErrors), 0.0, TotalErrors)
| join kind=leftouter
(
    totalErrorsByTypeSeries
)
on OperationType
| project-away OperationType1
| extend SeriesOfEqualLength = range(1, seriesLength, 1)
| extend SeriesOfZeroes = series_subtract(SeriesOfEqualLength, SeriesOfEqualLength)
| extend SeriesOfOneHundo = series_multiply(series_divide(SeriesOfEqualLength, SeriesOfEqualLength), 100)
| extend TotalErrorsByTypeSeries = iif(isempty(TotalErrorsByTypeSeries), SeriesOfZeroes, TotalErrorsByTypeSeries)
| join kind=leftouter
(
    totalOperationsByTypeSeries
)
on OperationType
| project-away OperationType1
| extend TotalSuccessfulOperationsByTypeSeries = series_subtract(TotalOperationsByTypeSeries, TotalErrorsByTypeSeries)
| extend SuccessPercentage = round(iif(TotalOperations == 0, 1.0, 1 - (TotalErrors / TotalOperations)), 4), SuccessPercentageSeries = series_divide(TotalSuccessfulOperationsByTypeSeries, TotalOperationsByTypeSeries)
| extend SuccessfulOperationsEqualsTotalOperationsSeries = series_equals(TotalSuccessfulOperationsByTypeSeries, TotalOperationsByTypeSeries)
| extend SuccessPercentageSeries = array_iff(SuccessfulOperationsEqualsTotalOperationsSeries, SeriesOfOneHundo, SuccessPercentageSeries)
| project OperationType, TotalOperations, TotalErrors, SuccessPercentage, SuccessPercentageSeries
| order by SuccessPercentage asc, OperationType asc
| project-rename ['Operation Type'] = OperationType, ['Total Operations'] = TotalOperations, ['Total Errors'] = TotalErrors, ['Success %'] = SuccessPercentage, ['Success % Trend'] = SuccessPercentageSeries

网络工作手册

以下的7行是用于图形总值显示的咒语,可以略过阅读。这适用于整个网络中的所有图形,并且是共通的。

| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize

注意点。

    • where Origin == ‘container.azm.ms/telegraf’

 

    ネットワーク関連は、where Namespace == ‘container.azm.ms/net’

网络概述

image.png

由于有很多查询,所以省略。

每秒传输的字节数

image.png
let bytesSentPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_sent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesSentPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface

每秒收到的字节

image.png
let bytesReceivedPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_recv'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1), (Val - PrevVal) / (datetime_diff('Second', TimeGenerated, PrevTimeGenerated) * 1))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
bytesReceivedPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface

每隔10分钟发送的总字节数

image.png
let bytesSentTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_sent'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, (Val - PrevVal) / 1)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
bytesSentTotal
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface

按10分钟间隔统计的总接收字节数。

image.png
let bytesReceivedTotal = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'bytes_recv'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / 1, (Val - PrevVal) / 1)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Rate;
let sum = bytesReceivedTotal
| make-series Val = sum(Rate) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName
| extend Name = strcat(HostName, ':', 'Sum')
| project-away HostName;
sum

每秒的错误数

由于截图时未找到任何结果,因此省略了屏幕显示。

let errorsOutPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_out'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / datetime_diff('Second', TimeGenerated, PrevTimeGenerated), (Val - PrevVal) / datetime_diff('Second', TimeGenerated, PrevTimeGenerated))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
errorsOutPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface

每秒错误数

由于捕捉图像时没有找到任何结果,因此省略了屏幕截图。

let errorsInPerSecond = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_in'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val / datetime_diff('Second', TimeGenerated, PrevTimeGenerated), (Val - PrevVal) / datetime_diff('Second', TimeGenerated, PrevTimeGenerated))
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
let maxOn = indexof("Average", 'Max');
let avgOn = indexof("Average", 'Average');
let minOn = indexof("Average", 'Min');
errorsInPerSecond
| make-series Val = iif(avgOn != -1, avg(Rate), iif(maxOn != -1, max(Rate), min(Rate))) default=0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, Interface)
| project-away HostName, Interface

总错误数(按照每10分钟为间隔)

由於抓取屏幕時结果为0个数据,所以省略了屏幕截图。

let totalErrorsOut = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_out'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
totalErrorsOut
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface

总错误数(按10m间隔)

let totalErrorsIn = InsightsMetrics
| where Origin == 'container.azm.ms/telegraf'
| where Namespace == 'container.azm.ms/net'
| where Name == 'err_in'
| extend Tags = todynamic(Tags)
| extend HostName = tostring(Tags.hostName), Interface = tostring(Tags.interface)
| where '*' in ('*') or HostName in ('*')
| where '*' in ('*') or Interface in ('*')
| extend partitionKey = strcat(HostName, '/', Interface)
| order by partitionKey asc, TimeGenerated asc
| serialize
| extend PrevVal = iif(prev(partitionKey) != partitionKey, 0.0, prev(Val)), PrevTimeGenerated = iif(prev(partitionKey) != partitionKey, datetime(null), prev(TimeGenerated))
| where isnotnull(PrevTimeGenerated) and PrevTimeGenerated != TimeGenerated
| extend Rate = iif(PrevVal > Val, Val, Val - PrevVal)
| where isnotnull(Rate)
| project TimeGenerated, HostName, Interface, Rate;
totalErrorsIn
| make-series Val = sum(Rate) default = 0 on TimeGenerated from ago(21600s) to now() step 10m by HostName, Interface
| extend Name = strcat(HostName, '/', Interface)
| project-away HostName, Interface

根据参考资料的概要归纳

    • Docs: Azure Monitor for containers を有効にする方法

Docs: Azure Monitor で Log Analytics の使用を開始する

Docs: Azure Monitor でログ クエリの使用を開始する

Docs: Azure Monitor for containers からログを照会する方法

Docs: Linux VM のカスタム メトリックを InfluxData Telegraf エージェントを使用して収集する

Docs: Azure Monitor ブックを使用した対話型レポートの作成

Qiita: Azure Monitor for containersが晴れてGAしました!

广告
将在 10 秒后关闭
bannerAds