Back to Blog
7 min read

Simplified Outbound Connectivity with Azure NAT Gateway

Introduction

Azure NAT Gateway provides outbound internet connectivity for resources in a virtual network subnet. It offers a scalable, highly available solution that eliminates the need for public IP addresses on individual VMs and avoids SNAT port exhaustion issues common with load balancers.

In this post, we will explore how to deploy and configure Azure NAT Gateway for reliable outbound connectivity.

Why NAT Gateway?

NAT Gateway solves several problems:

  • SNAT Port Exhaustion: Each NAT Gateway supports up to 64,000 concurrent connections per public IP
  • Simplified Architecture: No need for public IPs on individual VMs
  • Predictable Outbound IPs: Known source IPs for firewall whitelisting
  • High Availability: Built-in zone redundancy

Creating NAT Gateway

Deploy NAT Gateway using Azure CLI:

# Create public IP for NAT Gateway
az network public-ip create \
    --resource-group rg-networking \
    --name nat-gateway-pip \
    --sku Standard \
    --allocation-method Static \
    --zone 1 2 3

# Create public IP prefix for multiple outbound IPs
az network public-ip prefix create \
    --resource-group rg-networking \
    --name nat-gateway-pip-prefix \
    --length 29

# Create NAT Gateway
az network nat gateway create \
    --resource-group rg-networking \
    --name nat-gateway-main \
    --public-ip-addresses nat-gateway-pip \
    --public-ip-prefixes nat-gateway-pip-prefix \
    --idle-timeout 10

# Associate NAT Gateway with subnet
az network vnet subnet update \
    --resource-group rg-networking \
    --vnet-name vnet-main \
    --name subnet-backend \
    --nat-gateway nat-gateway-main

Terraform Configuration

Complete NAT Gateway setup with Terraform:

# Public IP for NAT Gateway
resource "azurerm_public_ip" "nat" {
  name                = "nat-gateway-pip"
  resource_group_name = azurerm_resource_group.networking.name
  location            = azurerm_resource_group.networking.location
  allocation_method   = "Static"
  sku                 = "Standard"
  zones               = ["1", "2", "3"]

  tags = {
    Environment = "Production"
    Purpose     = "NAT Gateway"
  }
}

# Public IP Prefix for additional outbound IPs
resource "azurerm_public_ip_prefix" "nat" {
  name                = "nat-gateway-pip-prefix"
  resource_group_name = azurerm_resource_group.networking.name
  location            = azurerm_resource_group.networking.location
  prefix_length       = 29  # 8 IP addresses
  sku                 = "Standard"
  zones               = ["1", "2", "3"]

  tags = {
    Environment = "Production"
  }
}

# NAT Gateway
resource "azurerm_nat_gateway" "main" {
  name                    = "nat-gateway-main"
  resource_group_name     = azurerm_resource_group.networking.name
  location                = azurerm_resource_group.networking.location
  sku_name                = "Standard"
  idle_timeout_in_minutes = 10
  zones                   = ["1", "2", "3"]

  tags = {
    Environment = "Production"
  }
}

# Associate public IP with NAT Gateway
resource "azurerm_nat_gateway_public_ip_association" "main" {
  nat_gateway_id       = azurerm_nat_gateway.main.id
  public_ip_address_id = azurerm_public_ip.nat.id
}

# Associate public IP prefix with NAT Gateway
resource "azurerm_nat_gateway_public_ip_prefix_association" "main" {
  nat_gateway_id      = azurerm_nat_gateway.main.id
  public_ip_prefix_id = azurerm_public_ip_prefix.nat.id
}

# Associate NAT Gateway with subnets
resource "azurerm_subnet_nat_gateway_association" "backend" {
  subnet_id      = azurerm_subnet.backend.id
  nat_gateway_id = azurerm_nat_gateway.main.id
}

resource "azurerm_subnet_nat_gateway_association" "compute" {
  subnet_id      = azurerm_subnet.compute.id
  nat_gateway_id = azurerm_nat_gateway.main.id
}

Scaling Outbound Connections

Add more public IPs to scale SNAT ports:

from azure.mgmt.network import NetworkManagementClient
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
network_client = NetworkManagementClient(credential, subscription_id)

# Calculate required public IPs based on expected connections
def calculate_required_ips(expected_connections_per_second, connection_duration_seconds=60):
    """
    Each public IP provides ~64,000 SNAT ports
    Connections are held for their duration
    """
    concurrent_connections = expected_connections_per_second * connection_duration_seconds
    ports_per_ip = 64000
    required_ips = (concurrent_connections // ports_per_ip) + 1
    return max(1, required_ips)

# Example: 10,000 connections/sec with 30 second duration
required = calculate_required_ips(10000, 30)
print(f"Required public IPs: {required}")

# Add additional public IPs
def add_public_ip_to_nat(resource_group, nat_gateway_name, pip_name):
    # Create new public IP
    pip = network_client.public_ip_addresses.begin_create_or_update(
        resource_group,
        pip_name,
        {
            "location": "eastus",
            "sku": {"name": "Standard"},
            "public_ip_allocation_method": "Static",
            "zones": ["1", "2", "3"]
        }
    ).result()

    # Get current NAT Gateway
    nat_gateway = network_client.nat_gateways.get(resource_group, nat_gateway_name)

    # Add new IP to the list
    if nat_gateway.public_ip_addresses is None:
        nat_gateway.public_ip_addresses = []

    nat_gateway.public_ip_addresses.append({"id": pip.id})

    # Update NAT Gateway
    result = network_client.nat_gateways.begin_create_or_update(
        resource_group,
        nat_gateway_name,
        nat_gateway
    ).result()

    return result

# Add 4 more public IPs
for i in range(4):
    add_public_ip_to_nat("rg-networking", "nat-gateway-main", f"nat-pip-{i+2}")
    print(f"Added nat-pip-{i+2}")

NAT Gateway vs Load Balancer Outbound

Compare NAT Gateway with Load Balancer SNAT:

# Comparison helper
def compare_outbound_options():
    comparison = {
        "NAT Gateway": {
            "snat_ports_per_ip": 64000,
            "port_allocation": "Dynamic (on-demand)",
            "idle_timeout": "4-120 minutes configurable",
            "zone_redundancy": "Built-in",
            "cost": "Per hour + per GB processed",
            "use_case": "General outbound connectivity"
        },
        "Load Balancer": {
            "snat_ports_per_ip": "1024-64000 (configurable)",
            "port_allocation": "Pre-allocated per backend",
            "idle_timeout": "4-30 minutes",
            "zone_redundancy": "Requires configuration",
            "cost": "Included with Standard LB",
            "use_case": "When already using LB for inbound"
        },
        "Public IP on VM": {
            "snat_ports_per_ip": "N/A (direct outbound)",
            "port_allocation": "N/A",
            "idle_timeout": "4 minutes default",
            "zone_redundancy": "N/A",
            "cost": "Per public IP",
            "use_case": "Simple single-VM scenarios"
        }
    }

    for option, details in comparison.items():
        print(f"\n{option}:")
        for key, value in details.items():
            print(f"  {key}: {value}")

compare_outbound_options()

Monitoring NAT Gateway

Monitor SNAT usage and performance:

from azure.mgmt.monitor import MonitorManagementClient

monitor_client = MonitorManagementClient(credential, subscription_id)

def get_nat_gateway_metrics(resource_group, nat_name):
    resource_uri = f"/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.Network/natGateways/{nat_name}"

    metrics = monitor_client.metrics.list(
        resource_uri=resource_uri,
        metricnames="SNATConnectionCount,TotalConnectionCount,ByteCount,PacketCount,PacketDropCount,DatapathAvailability",
        timespan="PT1H",
        interval="PT5M",
        aggregation="Average,Total,Maximum"
    )

    for metric in metrics.value:
        print(f"\n{metric.name.value}:")
        for ts in metric.timeseries:
            for data in ts.data[-5:]:  # Last 5 data points
                print(f"  {data.time_stamp}:")
                if data.average is not None:
                    print(f"    Avg: {data.average}")
                if data.total is not None:
                    print(f"    Total: {data.total}")
                if data.maximum is not None:
                    print(f"    Max: {data.maximum}")

get_nat_gateway_metrics("rg-networking", "nat-gateway-main")

# Create alert for high SNAT usage
alert = {
    "location": "global",
    "properties": {
        "description": "Alert when SNAT connection count is high",
        "severity": 2,
        "enabled": True,
        "evaluationFrequency": "PT5M",
        "windowSize": "PT15M",
        "criteria": {
            "odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria",
            "allOf": [{
                "name": "HighSNATConnections",
                "metricName": "SNATConnectionCount",
                "operator": "GreaterThan",
                "threshold": 50000,  # 78% of single IP capacity
                "timeAggregation": "Maximum"
            }]
        },
        "actions": [{
            "actionGroupId": f"/subscriptions/{subscription_id}/resourceGroups/rg-monitoring/providers/Microsoft.Insights/actionGroups/network-alerts"
        }]
    }
}

Diagnostic Logging

Enable diagnostic logs for troubleshooting:

# Enable diagnostic settings
az monitor diagnostic-settings create \
    --resource /subscriptions/$SUBSCRIPTION_ID/resourceGroups/rg-networking/providers/Microsoft.Network/natGateways/nat-gateway-main \
    --name nat-diagnostics \
    --workspace /subscriptions/$SUBSCRIPTION_ID/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/log-analytics-ws \
    --logs '[
        {
            "category": "DDoSProtectionNotifications",
            "enabled": true
        },
        {
            "category": "DDoSMitigationFlowLogs",
            "enabled": true
        },
        {
            "category": "DDoSMitigationReports",
            "enabled": true
        }
    ]' \
    --metrics '[
        {
            "category": "AllMetrics",
            "enabled": true
        }
    ]'

Multiple NAT Gateways

Use multiple NAT Gateways for different subnets:

# NAT Gateway per environment
locals {
  environments = {
    "production" = {
      subnets     = ["subnet-prod-app", "subnet-prod-data"]
      idle_timeout = 10
    }
    "staging" = {
      subnets     = ["subnet-staging"]
      idle_timeout = 4
    }
    "development" = {
      subnets     = ["subnet-dev"]
      idle_timeout = 4
    }
  }
}

resource "azurerm_public_ip" "nat" {
  for_each = local.environments

  name                = "nat-pip-${each.key}"
  resource_group_name = azurerm_resource_group.networking.name
  location            = azurerm_resource_group.networking.location
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_nat_gateway" "env" {
  for_each = local.environments

  name                    = "nat-gateway-${each.key}"
  resource_group_name     = azurerm_resource_group.networking.name
  location                = azurerm_resource_group.networking.location
  sku_name                = "Standard"
  idle_timeout_in_minutes = each.value.idle_timeout

  tags = {
    Environment = each.key
  }
}

resource "azurerm_nat_gateway_public_ip_association" "env" {
  for_each = local.environments

  nat_gateway_id       = azurerm_nat_gateway.env[each.key].id
  public_ip_address_id = azurerm_public_ip.nat[each.key].id
}

Best Practices

Key recommendations for NAT Gateway:

# Best practices checklist
best_practices = {
    "sizing": [
        "Plan for peak connection counts",
        "Each public IP = 64,000 SNAT ports",
        "Use IP prefixes for easier management at scale"
    ],
    "availability": [
        "NAT Gateway is zone-redundant by default",
        "Ensure public IPs are also zone-redundant",
        "No additional configuration needed for HA"
    ],
    "monitoring": [
        "Monitor SNATConnectionCount vs capacity",
        "Track PacketDropCount for issues",
        "Alert on DatapathAvailability < 100%"
    ],
    "security": [
        "NAT Gateway does not support inbound connections",
        "Use in combination with NSGs",
        "Outbound IPs are predictable for whitelisting"
    ],
    "cost_optimization": [
        "NAT Gateway has per-hour and per-GB costs",
        "Consolidate subnets where possible",
        "Consider if needed for all subnets"
    ]
}

for category, items in best_practices.items():
    print(f"\n{category.upper()}:")
    for item in items:
        print(f"  - {item}")

Conclusion

Azure NAT Gateway simplifies outbound connectivity while providing superior scalability compared to other options. With dynamic SNAT port allocation and support for multiple public IPs, it eliminates port exhaustion issues that plague high-throughput applications.

The key benefits include predictable outbound IPs for security whitelisting, built-in zone redundancy, and no impact on inbound connectivity. For any production workload requiring reliable outbound internet access, NAT Gateway should be your first choice.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.