October 7, 2021 2 min read

Comprehensive AKS Monitoring with Container Insights

Azure Kubernetes AKS Monitoring Azure Monitor

Comprehensive AKS Monitoring with Container Insights

Container Insights provides comprehensive monitoring for your AKS clusters. It collects metrics, logs, and events from your containers, giving you deep visibility into cluster health and performance.

Enabling Container Insights

On a New Cluster

az aks create \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --node-count 3 \
    --enable-addons monitoring \
    --workspace-resource-id /subscriptions/{sub}/resourcegroups/{rg}/providers/microsoft.operationalinsights/workspaces/{workspace}

On an Existing Cluster

az aks enable-addons \
    --addons monitoring \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --workspace-resource-id /subscriptions/{sub}/resourcegroups/{rg}/providers/microsoft.operationalinsights/workspaces/{workspace}

What Gets Collected

Container Insights collects:

Performance metrics: CPU, memory, network, disk for nodes and containers
Container logs: stdout and stderr from all containers
Kubernetes events: Cluster events and state changes
Inventory data: Pods, deployments, services, nodes

Verifying Installation

# Check the omsagent DaemonSet
kubectl get daemonset omsagent -n kube-system

# Check omsagent pods
kubectl get pods -n kube-system -l component=oms-agent

Key Container Insights Tables

Perf Table - Performance Metrics

Perf
| where ObjectName == "K8SContainer"
| where CounterName == "cpuUsageNanoCores"
| summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 5m), InstanceName
| render timechart

ContainerLog Table - Container Logs

ContainerLog
| where LogEntry contains "error"
| project TimeGenerated, Computer, ContainerID, LogEntry
| order by TimeGenerated desc
| take 100

KubeEvents Table - Kubernetes Events

KubeEvents
| where KubeEventType == "Warning"
| summarize count() by Name, Reason
| order by count_ desc

KubePodInventory - Pod Information

KubePodInventory
| where Namespace != "kube-system"
| where PodStatus == "Running"
| summarize count() by Namespace, ControllerName
| order by count_ desc

Useful Queries

High CPU Pods

Perf
| where ObjectName == "K8SContainer"
| where CounterName == "cpuUsageNanoCores"
| summarize AvgCPU = avg(CounterValue) by InstanceName
| top 10 by AvgCPU desc

Memory Usage by Namespace

KubePodInventory
| where TimeGenerated > ago(1h)
| join kind=inner (
    Perf
    | where ObjectName == "K8SContainer"
    | where CounterName == "memoryWorkingSetBytes"
) on $left.ContainerID == $right.InstanceName
| summarize AvgMemory = avg(CounterValue) by Namespace
| order by AvgMemory desc

Container Restart Count

KubePodInventory
| where TimeGenerated > ago(24h)
| where ContainerRestartCount > 0
| summarize MaxRestarts = max(ContainerRestartCount) by Name, Namespace
| where MaxRestarts > 3
| order by MaxRestarts desc

Pods in Failed State

KubePodInventory
| where TimeGenerated > ago(1h)
| where PodStatus == "Failed"
| distinct Name, Namespace, PodStatus

Node Resource Pressure

KubeNodeInventory
| where TimeGenerated > ago(1h)
| where Status contains "pressure"
| project TimeGenerated, Computer, Status

Setting Up Alerts

High CPU Alert

az monitor scheduled-query create \
    --name "High CPU Alert" \
    --resource-group myResourceGroup \
    --scopes /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/{workspace} \
    --condition "count 'Perf | where ObjectName == \"K8SContainer\" | where CounterName == \"cpuUsageNanoCores\" | summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 5m), InstanceName | where AvgCPU > 800000000' > 0" \
    --severity 2 \
    --evaluation-frequency 5m \
    --window-size 15m \
    --action-groups /subscriptions/{sub}/resourceGroups/{rg}/providers/microsoft.insights/actionGroups/myActionGroup

Pod Restart Alert

Create an alert rule in the portal or via ARM template:

{
  "type": "Microsoft.Insights/scheduledQueryRules",
  "apiVersion": "2021-08-01",
  "name": "Pod Restart Alert",
  "location": "eastus",
  "properties": {
    "displayName": "Pod Restart Alert",
    "severity": 2,
    "enabled": true,
    "evaluationFrequency": "PT5M",
    "scopes": ["/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/{workspace}"],
    "windowSize": "PT15M",
    "criteria": {
      "allOf": [
        {
          "query": "KubePodInventory | where ContainerRestartCount > 5 | summarize count() by Name, Namespace",
          "timeAggregation": "Count",
          "operator": "GreaterThan",
          "threshold": 0
        }
      ]
    },
    "actions": {
      "actionGroups": ["/subscriptions/{sub}/resourceGroups/{rg}/providers/microsoft.insights/actionGroups/myActionGroup"]
    }
  }
}

Configuring Log Collection

ConfigMap for Log Collection Settings

apiVersion: v1
kind: ConfigMap
metadata:
  name: container-azm-ms-agentconfig
  namespace: kube-system
data:
  schema-version: v1
  config-version: ver1
  log-data-collection-settings: |
    [log_collection_settings]
      [log_collection_settings.stdout]
        enabled = true
        exclude_namespaces = ["kube-system"]
      [log_collection_settings.stderr]
        enabled = true
        exclude_namespaces = ["kube-system"]
      [log_collection_settings.env_var]
        enabled = true
  prometheus-data-collection-settings: |
    [prometheus_data_collection_settings.cluster]
      interval = "1m"
      monitor_kubernetes_pods = true

Apply the ConfigMap:

kubectl apply -f container-azm-ms-agentconfig.yaml

Workbook Visualizations

Container Insights includes built-in workbooks:

Cluster health
Node performance
Controller metrics
Container insights
Deployments and HPAs
Persistent volumes

Access them through: Azure Portal > AKS Cluster > Insights > Workbooks

Cost Optimization

Container logs can generate significant data. Optimize costs by:

Filtering namespaces - Exclude noisy namespaces
Sampling - Reduce collection frequency
Retention policies - Set appropriate retention periods
Data caps - Configure daily caps

# Set daily cap on workspace
az monitor log-analytics workspace update \
    --resource-group myResourceGroup \
    --workspace-name myWorkspace \
    --quota 5

Conclusion

Container Insights provides the foundation for observability in AKS. Combined with custom queries, alerts, and workbooks, you can maintain deep visibility into your Kubernetes workloads.

Tomorrow, we’ll explore Prometheus metrics collection for even more detailed application monitoring.