January 5, 2022 1 min read

AKS Cost Optimization: Strategies for Reducing Kubernetes Spending

Running Kubernetes in production can be expensive if not managed properly. Let’s explore proven strategies for optimizing your AKS costs while maintaining performance and reliability.

Understanding AKS Costs

AKS costs come from several components:

Virtual Machine nodes
Storage (managed disks, Azure Files)
Networking (load balancers, bandwidth)
Container Registry
Log Analytics

Right-Sizing Your Nodes

Use the Kubernetes Vertical Pod Autoscaler (VPA) to understand actual resource usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Just recommend, don't auto-apply
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi

Cluster Autoscaler Configuration

Configure the cluster autoscaler for cost efficiency:

agentPoolProfiles: [
  {
    name: 'userpool'
    count: 2
    vmSize: 'Standard_D4s_v3'
    enableAutoScaling: true
    minCount: 1
    maxCount: 20
    scaleDownMode: 'Delete'
    scaleSetEvictionPolicy: 'Delete'
  }
]

Set appropriate scale-down settings:

az aks update \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --cluster-autoscaler-profile \
    scale-down-delay-after-add=10m \
    scale-down-unneeded-time=10m \
    scale-down-utilization-threshold=0.5

Using Spot Node Pools

Spot VMs can reduce costs by up to 90%:

resource spotPool 'Microsoft.ContainerService/managedClusters/agentPools@2021-10-01' = {
  parent: aksCluster
  name: 'spotpool'
  properties: {
    count: 3
    vmSize: 'Standard_D4s_v3'
    scaleSetPriority: 'Spot'
    scaleSetEvictionPolicy: 'Delete'
    spotMaxPrice: -1  // Use current spot price
    enableAutoScaling: true
    minCount: 0
    maxCount: 50
    nodeTaints: [
      'kubernetes.azure.com/scalesetpriority=spot:NoSchedule'
    ]
    nodeLabels: {
      'kubernetes.azure.com/scalesetpriority': 'spot'
    }
  }
}

Schedule workloads on spot nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: batch-job
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.azure.com/scalesetpriority
                operator: In
                values:
                - spot
      tolerations:
      - key: kubernetes.azure.com/scalesetpriority
        operator: Equal
        value: spot
        effect: NoSchedule
      containers:
      - name: batch
        image: myregistry.azurecr.io/batch:latest
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2"
            memory: "2Gi"

Reserved Instances for Base Capacity

For predictable workloads, use reserved instances:

# Calculate your base capacity needs first
# Then purchase reservations for consistent savings

az reservations reservation-order purchase \
  --sku Standard_D4s_v3 \
  --location eastus \
  --reserved-resource-type VirtualMachines \
  --billing-scope-id /subscriptions/xxx \
  --term P1Y \
  --quantity 5 \
  --applied-scope-type Single \
  --applied-scopes /subscriptions/xxx/resourceGroups/myResourceGroup

Implementing Pod Disruption Budgets

Protect critical workloads while allowing cost optimization:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

Cost Monitoring with KubeCost

Deploy KubeCost for visibility:

helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="xxx"

Summary

Key cost optimization strategies:

Right-size resources using VPA recommendations
Use cluster autoscaler aggressively
Leverage Spot VMs for fault-tolerant workloads
Purchase reserved instances for baseline capacity
Monitor and analyze costs continuously
Implement resource quotas and limit ranges

A well-optimized AKS cluster can reduce costs by 50-70% without sacrificing performance.