October 2, 2021 2 min read

Designing AKS Node Pools for Production Workloads

Node pools in Azure Kubernetes Service allow you to group nodes with different configurations. This flexibility is essential for running diverse workloads efficiently. Let’s explore how to design node pools for production environments.

System vs User Node Pools

AKS distinguishes between two types of node pools:

System Node Pools:

Run critical system pods (CoreDNS, metrics-server, etc.)
Required for cluster operation
Should be highly available

User Node Pools:

Run your application workloads
Can be specialized for different needs
Can be scaled to zero

Creating a System Node Pool

# Create a dedicated system node pool
az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name systempool \
    --node-count 3 \
    --node-vm-size Standard_DS2_v2 \
    --mode System \
    --zones 1 2 3

Creating Specialized User Node Pools

General Purpose Workloads

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name generalpool \
    --node-count 3 \
    --node-vm-size Standard_D4s_v3 \
    --mode User \
    --zones 1 2 3 \
    --labels workload=general

Memory-Optimized for Databases

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name memorypool \
    --node-count 2 \
    --node-vm-size Standard_E4s_v3 \
    --mode User \
    --zones 1 2 3 \
    --labels workload=memory-intensive \
    --node-taints workload=memory-intensive:NoSchedule

GPU-Enabled for ML Workloads

az aks nodepool add \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpupool \
    --node-count 1 \
    --node-vm-size Standard_NC6s_v3 \
    --mode User \
    --labels workload=gpu \
    --node-taints sku=gpu:NoSchedule

Using Node Selectors and Tolerations

To schedule pods on specific node pools, use node selectors and tolerations:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-training-job
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ml-training
  template:
    metadata:
      labels:
        app: ml-training
    spec:
      nodeSelector:
        workload: gpu
      tolerations:
      - key: "sku"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
      containers:
      - name: training
        image: myregistry.azurecr.io/ml-training:v1
        resources:
          limits:
            nvidia.com/gpu: 1

Node Pool Autoscaling

Enable cluster autoscaler for dynamic scaling:

az aks nodepool update \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name generalpool \
    --enable-cluster-autoscaler \
    --min-count 2 \
    --max-count 10

Scaling Node Pools to Zero

User node pools can scale to zero to save costs:

# Scale to zero manually
az aks nodepool scale \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpupool \
    --node-count 0

# Or enable autoscaler with min-count 0
az aks nodepool update \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name gpupool \
    --enable-cluster-autoscaler \
    --min-count 0 \
    --max-count 3

Node Pool Management with Terraform

For infrastructure as code, here’s a Terraform example:

resource "azurerm_kubernetes_cluster_node_pool" "general" {
  name                  = "general"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
  vm_size              = "Standard_D4s_v3"
  node_count           = 3

  enable_auto_scaling = true
  min_count           = 2
  max_count           = 10

  zones = ["1", "2", "3"]

  node_labels = {
    "workload" = "general"
  }

  tags = {
    Environment = "Production"
  }
}

resource "azurerm_kubernetes_cluster_node_pool" "memory" {
  name                  = "memory"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
  vm_size              = "Standard_E8s_v3"
  node_count           = 2

  enable_auto_scaling = true
  min_count           = 1
  max_count           = 5

  zones = ["1", "2", "3"]

  node_labels = {
    "workload" = "memory-intensive"
  }

  node_taints = [
    "workload=memory-intensive:NoSchedule"
  ]
}

Best Practices

Separate system and user workloads - Keep system pods isolated for stability
Use availability zones - Distribute nodes across zones for HA
Right-size your VMs - Match VM sizes to workload requirements
Implement autoscaling - Use cluster autoscaler for dynamic workloads
Use taints and tolerations - Ensure pods run on appropriate nodes
Plan for upgrades - Design node pools to allow rolling upgrades

Conclusion

Thoughtful node pool design is crucial for running efficient, cost-effective Kubernetes workloads. By separating workloads and matching resources to requirements, you can optimize both performance and cost.

Tomorrow, we’ll explore spot node pools for cost optimization with interruptible workloads.