5 min read
Deep Dive into Azure Monitor for Containers
Deep Dive into Azure Monitor for Containers
Azure Monitor for Containers provides native Azure integration for monitoring Kubernetes clusters. Today we’ll explore its advanced features and how to leverage them for comprehensive observability.
Architecture Overview
Azure Monitor for Containers consists of:
- Container Insights agent (omsagent) - DaemonSet collecting metrics and logs
- Log Analytics workspace - Stores collected data
- Azure Monitor - Provides alerting, visualization, and analysis
Data Collection Configuration
Customizing Collection with ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: container-azm-ms-agentconfig
namespace: kube-system
data:
schema-version: v1
config-version: ver1
log-data-collection-settings: |
[log_collection_settings]
[log_collection_settings.stdout]
enabled = true
exclude_namespaces = ["kube-system", "gatekeeper-system"]
[log_collection_settings.stderr]
enabled = true
exclude_namespaces = ["kube-system"]
[log_collection_settings.env_var]
enabled = false
[log_collection_settings.enrich_container_logs]
enabled = true
[log_collection_settings.collect_all_kube_events]
enabled = true
metric-data-collection-settings: |
[metric_collection_settings]
interval = "1m"
namespace_filtering_mode = "Include"
namespaces = ["default", "production", "staging"]
Filtering by Annotation
Control log collection per pod:
apiVersion: v1
kind: Pod
metadata:
name: verbose-app
annotations:
# Exclude from log collection
fluentbit.io/exclude: "true"
spec:
containers:
- name: app
image: myapp:v1
Advanced Kusto Queries
Container Performance Analysis
// CPU throttling analysis
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "cpuThrottledTimeMs"
| summarize ThrottledMs = sum(CounterValue) by bin(TimeGenerated, 5m), InstanceName
| where ThrottledMs > 0
| render timechart
Memory Pressure Detection
// Containers approaching memory limits
let memoryUsage = Perf
| where ObjectName == "K8SContainer"
| where CounterName == "memoryWorkingSetBytes"
| summarize MemUsage = avg(CounterValue) by InstanceName;
let memoryLimits = KubePodInventory
| distinct ContainerID, ContainerName
| join kind=inner (
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "memoryLimitBytes"
| where CounterValue > 0
| summarize MemLimit = avg(CounterValue) by InstanceName
) on $left.ContainerID == $right.InstanceName;
memoryUsage
| join kind=inner memoryLimits on InstanceName
| extend UsagePercent = (MemUsage / MemLimit) * 100
| where UsagePercent > 80
| project ContainerName, UsagePercent, MemUsage, MemLimit
| order by UsagePercent desc
Log Analysis for Errors
// Error trends by container
ContainerLog
| where LogEntry contains "error" or LogEntry contains "exception"
| summarize ErrorCount = count() by bin(TimeGenerated, 1h), ContainerID
| join kind=inner (
KubePodInventory
| distinct ContainerID, Name, Namespace
) on ContainerID
| project TimeGenerated, Name, Namespace, ErrorCount
| render timechart
Deployment Health
// Deployment replica status
KubePodInventory
| where TimeGenerated > ago(1h)
| summarize
Running = countif(PodStatus == "Running"),
Pending = countif(PodStatus == "Pending"),
Failed = countif(PodStatus == "Failed")
by ControllerName, Namespace
| where Pending > 0 or Failed > 0
Workbooks
Creating Custom Workbooks
{
"version": "Notebook/1.0",
"items": [
{
"type": 1,
"content": {
"json": "# AKS Cluster Health Dashboard"
}
},
{
"type": 3,
"content": {
"version": "KqlItem/1.0",
"query": "KubeNodeInventory | summarize count() by Status | render piechart",
"size": 1,
"title": "Node Status"
}
},
{
"type": 3,
"content": {
"version": "KqlItem/1.0",
"query": "KubePodInventory | where TimeGenerated > ago(1h) | summarize count() by PodStatus | render piechart",
"size": 1,
"title": "Pod Status"
}
}
]
}
Alerting Strategies
Resource-Based Alerts
# High memory alert
az monitor metrics alert create \
--name "High Memory Usage" \
--resource-group myResourceGroup \
--scopes "/subscriptions/{sub}/resourceGroups/MC_{rg}_{cluster}_{region}/providers/Microsoft.Compute/virtualMachineScaleSets/{vmss}" \
--condition "avg Percentage Memory > 85" \
--window-size 5m \
--evaluation-frequency 1m \
--action /subscriptions/{sub}/resourceGroups/{rg}/providers/microsoft.insights/actionGroups/{ag}
Log-Based Alerts
// Alert: Pod in CrashLoopBackOff
KubePodInventory
| where PodStatus == "Running"
| where ContainerStatus contains "CrashLoopBackOff"
| distinct Name, Namespace, ContainerStatus
Multi-Resource Alerts
{
"type": "Microsoft.Insights/scheduledQueryRules",
"properties": {
"displayName": "Cross-Cluster Alert",
"scopes": [
"/subscriptions/{sub}/resourceGroups/{rg1}/providers/Microsoft.OperationalInsights/workspaces/{ws1}",
"/subscriptions/{sub}/resourceGroups/{rg2}/providers/Microsoft.OperationalInsights/workspaces/{ws2}"
],
"criteria": {
"allOf": [
{
"query": "KubePodInventory | where PodStatus == 'Failed' | summarize count() by ClusterName",
"threshold": 0,
"operator": "GreaterThan"
}
]
}
}
}
Integration with Azure Services
Sending Alerts to Logic Apps
{
"type": "Microsoft.Logic/workflows",
"properties": {
"definition": {
"triggers": {
"manual": {
"type": "Request",
"kind": "Http"
}
},
"actions": {
"Send_Teams_message": {
"type": "ApiConnection",
"inputs": {
"method": "post",
"body": {
"title": "AKS Alert: @{triggerBody()?['data']?['alertContext']?['AlertRule']}",
"text": "@{triggerBody()?['data']?['alertContext']?['Description']}"
}
}
}
}
}
}
}
Exporting to Event Hubs
# Create diagnostic setting for continuous export
az monitor diagnostic-settings create \
--name export-to-eventhub \
--resource "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.ContainerService/managedClusters/{cluster}" \
--event-hub-rule "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.EventHub/namespaces/{ns}/authorizationRules/RootManageSharedAccessKey" \
--logs '[{"category":"kube-apiserver","enabled":true},{"category":"kube-controller-manager","enabled":true}]'
Cost Management
Analyzing Data Ingestion
// Data volume by table
Usage
| where TimeGenerated > ago(30d)
| where IsBillable == true
| summarize TotalGB = sum(Quantity) / 1000 by DataType
| order by TotalGB desc
Implementing Sampling
# Reduce collection frequency for non-critical namespaces
metric-data-collection-settings: |
[metric_collection_settings]
interval = "5m" # Increase interval to reduce data
Best Practices
- Right-size your workspace - Choose appropriate pricing tier
- Filter noisy namespaces - Exclude kube-system from log collection
- Use data collection rules - Control what data is collected
- Implement retention policies - Archive old data to cheaper storage
- Use workbooks for investigation - Build reusable analysis templates
Conclusion
Azure Monitor for Containers provides native, comprehensive monitoring for AKS clusters. Combined with custom queries, alerts, and workbooks, you can build a robust observability platform integrated with the Azure ecosystem.
Tomorrow, we’ll explore Log Analytics workspace design patterns for enterprise environments.