Back to Blog
2 min read

Azure Machine Learning Managed Endpoints: Deploying Models with Zero Downtime

Production ML deployments demand reliability. Azure Machine Learning managed endpoints provide enterprise-grade model serving with built-in traffic splitting, autoscaling, and zero-downtime deployments. Here’s how to implement robust deployment pipelines.

Creating Managed Endpoints

Set up an endpoint that supports multiple model versions:

from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration
)
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="your-sub",
    resource_group_name="your-rg",
    workspace_name="your-workspace"
)

# Create the endpoint
endpoint = ManagedOnlineEndpoint(
    name="fraud-detection-endpoint",
    description="Real-time fraud detection model",
    auth_mode="key",
    tags={"team": "risk-analytics", "env": "production"}
)

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Create initial deployment
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name="fraud-detection-endpoint",
    model=Model(path="./models/fraud_model_v1"),
    environment=Environment(
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
        conda_file="./environments/inference_env.yml"
    ),
    code_configuration=CodeConfiguration(
        code="./src",
        scoring_script="score.py"
    ),
    instance_type="Standard_DS3_v2",
    instance_count=3,
    request_settings={
        "request_timeout_ms": 3000,
        "max_concurrent_requests_per_instance": 100
    },
    liveness_probe={"initial_delay": 30, "period": 10},
    readiness_probe={"initial_delay": 30, "period": 10}
)

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

Blue-Green Deployment Pattern

Deploy new versions without downtime using traffic splitting:

# Deploy new version as green
green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name="fraud-detection-endpoint",
    model=Model(path="./models/fraud_model_v2"),
    # ... same configuration
)

ml_client.online_deployments.begin_create_or_update(green_deployment).result()

# Gradually shift traffic
endpoint.traffic = {"blue": 90, "green": 10}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Monitor metrics, then complete rollout
endpoint.traffic = {"blue": 0, "green": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Clean up old deployment
ml_client.online_deployments.begin_delete(
    name="blue",
    endpoint_name="fraud-detection-endpoint"
).result()

Monitoring and Rollback

Configure Azure Monitor alerts on latency and error rates. Implement automatic rollback triggers when thresholds are breached.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.