Back to Blog
3 min read

Azure Spot Instances Strategies: Maximizing Savings with Smart Architecture

Building reliable systems on Spot instances requires careful architecture. Let’s explore advanced strategies for maximizing Spot VM benefits while minimizing the impact of evictions.

Multi-Region Spot Strategy

Distribute workloads across regions for higher availability:

var regions = [
  'eastus'
  'westus2'
  'northeurope'
  'westeurope'
]

resource vmss 'Microsoft.Compute/virtualMachineScaleSets@2021-11-01' = [for region in regions: {
  name: 'spot-vmss-${region}'
  location: region
  sku: {
    name: 'Standard_D4s_v3'
    capacity: 5
  }
  properties: {
    virtualMachineProfile: {
      priority: 'Spot'
      evictionPolicy: 'Delete'
      billingProfile: {
        maxPrice: -1
      }
      // ... other config
    }
  }
}]

Mixed Priority Architecture

Combine Spot and regular VMs for reliability:

// Base capacity with regular VMs
resource regularVMSS 'Microsoft.Compute/virtualMachineScaleSets@2021-11-01' = {
  name: 'regular-vmss'
  location: location
  sku: {
    name: 'Standard_D4s_v3'
    capacity: 3  // Minimum guaranteed capacity
  }
  properties: {
    virtualMachineProfile: {
      priority: 'Regular'
      // ... config
    }
  }
}

// Burst capacity with Spot VMs
resource spotVMSS 'Microsoft.Compute/virtualMachineScaleSets@2021-11-01' = {
  name: 'spot-vmss'
  location: location
  sku: {
    name: 'Standard_D4s_v3'
    capacity: 10  // Extra capacity at lower cost
  }
  properties: {
    virtualMachineProfile: {
      priority: 'Spot'
      evictionPolicy: 'Delete'
      billingProfile: {
        maxPrice: -1
      }
      // ... config
    }
  }
}

Queue-Based Worker Pattern

Design workers that process jobs from a queue:

using Azure.Messaging.ServiceBus;

public class SpotFriendlyWorker
{
    private readonly ServiceBusClient _client;
    private readonly ServiceBusProcessor _processor;
    private CancellationTokenSource _cts;

    public SpotFriendlyWorker(string connectionString)
    {
        _client = new ServiceBusClient(connectionString);
        _processor = _client.CreateProcessor("work-queue", new ServiceBusProcessorOptions
        {
            MaxConcurrentCalls = 5,
            AutoCompleteMessages = false,
            MaxAutoLockRenewalDuration = TimeSpan.FromMinutes(30)
        });

        _processor.ProcessMessageAsync += ProcessMessageAsync;
        _processor.ProcessErrorAsync += ProcessErrorAsync;
    }

    private async Task ProcessMessageAsync(ProcessMessageEventArgs args)
    {
        var job = args.Message.Body.ToObjectFromJson<WorkItem>();

        try
        {
            // Process with checkpointing
            await ProcessWithCheckpoints(job, args.CancellationToken);

            // Complete only after successful processing
            await args.CompleteMessageAsync(args.Message);
        }
        catch (OperationCanceledException)
        {
            // Eviction happening - abandon message for retry
            await args.AbandonMessageAsync(args.Message);
            throw;
        }
    }

    private async Task ProcessWithCheckpoints(WorkItem job, CancellationToken ct)
    {
        var checkpoint = await LoadCheckpoint(job.Id);

        for (int i = checkpoint; i < job.TotalSteps; i++)
        {
            ct.ThrowIfCancellationRequested();

            await ProcessStep(job, i);
            await SaveCheckpoint(job.Id, i + 1);
        }
    }
}

Scheduled Events Integration

Integrate Azure Scheduled Events for proactive eviction handling:

import asyncio
import aiohttp
import signal
from datetime import datetime

class SpotInstanceMonitor:
    def __init__(self):
        self.running = True
        self.eviction_detected = False

    async def monitor_scheduled_events(self):
        url = "http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01"
        headers = {"Metadata": "true"}

        async with aiohttp.ClientSession() as session:
            while self.running:
                try:
                    async with session.get(url, headers=headers) as response:
                        data = await response.json()

                        for event in data.get("Events", []):
                            if event["EventType"] == "Preempt":
                                self.eviction_detected = True
                                await self.handle_eviction(event)
                                return

                except Exception as e:
                    print(f"Monitor error: {e}")

                await asyncio.sleep(5)

    async def handle_eviction(self, event):
        print(f"Eviction at {datetime.now()}: {event}")

        # Acknowledge the event to get maximum time
        await self.acknowledge_event(event["EventId"])

        # Trigger graceful shutdown
        await self.graceful_shutdown()

    async def acknowledge_event(self, event_id):
        url = "http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01"
        headers = {"Metadata": "true"}
        body = {"StartRequests": [{"EventId": event_id}]}

        async with aiohttp.ClientSession() as session:
            await session.post(url, headers=headers, json=body)

Cost Comparison Dashboard

Track your Spot savings:

// Log Analytics query for Spot savings tracking
AzureDiagnostics
| where ResourceType == "VIRTUALMACHINESCALESETS"
| where Category == "Autoscale"
| summarize
    SpotHours = sumif(Duration, Priority == "Spot"),
    RegularHours = sumif(Duration, Priority == "Regular")
| extend
    SpotCost = SpotHours * 0.05,  // Example Spot rate
    RegularCost = RegularHours * 0.20,  // Regular rate
    Savings = (RegularHours * 0.20) - (SpotHours * 0.05)

Smart Spot instance architecture lets you achieve massive cost savings while maintaining the reliability your applications need.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.