Comprehensive AKS Monitoring with Container Insights
Comprehensive AKS Monitoring with Container Insights
Container Insights provides comprehensive monitoring for your AKS clusters. It collects metrics, logs, and events from your containers, giving you deep visibility into cluster health and performance.
Enabling Container Insights
On a New Cluster
az aks create \
--resource-group myResourceGroup \
--name myAKSCluster \
--node-count 3 \
--enable-addons monitoring \
--workspace-resource-id /subscriptions/{sub}/resourcegroups/{rg}/providers/microsoft.operationalinsights/workspaces/{workspace}
On an Existing Cluster
az aks enable-addons \
--addons monitoring \
--resource-group myResourceGroup \
--name myAKSCluster \
--workspace-resource-id /subscriptions/{sub}/resourcegroups/{rg}/providers/microsoft.operationalinsights/workspaces/{workspace}
What Gets Collected
Container Insights collects:
- Performance metrics: CPU, memory, network, disk for nodes and containers
- Container logs: stdout and stderr from all containers
- Kubernetes events: Cluster events and state changes
- Inventory data: Pods, deployments, services, nodes
Verifying Installation
# Check the omsagent DaemonSet
kubectl get daemonset omsagent -n kube-system
# Check omsagent pods
kubectl get pods -n kube-system -l component=oms-agent
Key Container Insights Tables
Perf Table - Performance Metrics
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "cpuUsageNanoCores"
| summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 5m), InstanceName
| render timechart
ContainerLog Table - Container Logs
ContainerLog
| where LogEntry contains "error"
| project TimeGenerated, Computer, ContainerID, LogEntry
| order by TimeGenerated desc
| take 100
KubeEvents Table - Kubernetes Events
KubeEvents
| where KubeEventType == "Warning"
| summarize count() by Name, Reason
| order by count_ desc
KubePodInventory - Pod Information
KubePodInventory
| where Namespace != "kube-system"
| where PodStatus == "Running"
| summarize count() by Namespace, ControllerName
| order by count_ desc
Useful Queries
High CPU Pods
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "cpuUsageNanoCores"
| summarize AvgCPU = avg(CounterValue) by InstanceName
| top 10 by AvgCPU desc
Memory Usage by Namespace
KubePodInventory
| where TimeGenerated > ago(1h)
| join kind=inner (
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "memoryWorkingSetBytes"
) on $left.ContainerID == $right.InstanceName
| summarize AvgMemory = avg(CounterValue) by Namespace
| order by AvgMemory desc
Container Restart Count
KubePodInventory
| where TimeGenerated > ago(24h)
| where ContainerRestartCount > 0
| summarize MaxRestarts = max(ContainerRestartCount) by Name, Namespace
| where MaxRestarts > 3
| order by MaxRestarts desc
Pods in Failed State
KubePodInventory
| where TimeGenerated > ago(1h)
| where PodStatus == "Failed"
| distinct Name, Namespace, PodStatus
Node Resource Pressure
KubeNodeInventory
| where TimeGenerated > ago(1h)
| where Status contains "pressure"
| project TimeGenerated, Computer, Status
Setting Up Alerts
High CPU Alert
az monitor scheduled-query create \
--name "High CPU Alert" \
--resource-group myResourceGroup \
--scopes /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/{workspace} \
--condition "count 'Perf | where ObjectName == \"K8SContainer\" | where CounterName == \"cpuUsageNanoCores\" | summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 5m), InstanceName | where AvgCPU > 800000000' > 0" \
--severity 2 \
--evaluation-frequency 5m \
--window-size 15m \
--action-groups /subscriptions/{sub}/resourceGroups/{rg}/providers/microsoft.insights/actionGroups/myActionGroup
Pod Restart Alert
Create an alert rule in the portal or via ARM template:
{
"type": "Microsoft.Insights/scheduledQueryRules",
"apiVersion": "2021-08-01",
"name": "Pod Restart Alert",
"location": "eastus",
"properties": {
"displayName": "Pod Restart Alert",
"severity": 2,
"enabled": true,
"evaluationFrequency": "PT5M",
"scopes": ["/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/{workspace}"],
"windowSize": "PT15M",
"criteria": {
"allOf": [
{
"query": "KubePodInventory | where ContainerRestartCount > 5 | summarize count() by Name, Namespace",
"timeAggregation": "Count",
"operator": "GreaterThan",
"threshold": 0
}
]
},
"actions": {
"actionGroups": ["/subscriptions/{sub}/resourceGroups/{rg}/providers/microsoft.insights/actionGroups/myActionGroup"]
}
}
}
Configuring Log Collection
ConfigMap for Log Collection Settings
apiVersion: v1
kind: ConfigMap
metadata:
name: container-azm-ms-agentconfig
namespace: kube-system
data:
schema-version: v1
config-version: ver1
log-data-collection-settings: |
[log_collection_settings]
[log_collection_settings.stdout]
enabled = true
exclude_namespaces = ["kube-system"]
[log_collection_settings.stderr]
enabled = true
exclude_namespaces = ["kube-system"]
[log_collection_settings.env_var]
enabled = true
prometheus-data-collection-settings: |
[prometheus_data_collection_settings.cluster]
interval = "1m"
monitor_kubernetes_pods = true
Apply the ConfigMap:
kubectl apply -f container-azm-ms-agentconfig.yaml
Workbook Visualizations
Container Insights includes built-in workbooks:
- Cluster health
- Node performance
- Controller metrics
- Container insights
- Deployments and HPAs
- Persistent volumes
Access them through: Azure Portal > AKS Cluster > Insights > Workbooks
Cost Optimization
Container logs can generate significant data. Optimize costs by:
- Filtering namespaces - Exclude noisy namespaces
- Sampling - Reduce collection frequency
- Retention policies - Set appropriate retention periods
- Data caps - Configure daily caps
# Set daily cap on workspace
az monitor log-analytics workspace update \
--resource-group myResourceGroup \
--workspace-name myWorkspace \
--quota 5
Conclusion
Container Insights provides the foundation for observability in AKS. Combined with custom queries, alerts, and workbooks, you can maintain deep visibility into your Kubernetes workloads.
Tomorrow, we’ll explore Prometheus metrics collection for even more detailed application monitoring.