July 17, 2025 1 min read

Azure Machine Learning Managed Endpoints: Deploying Models with Zero Downtime

Azure Machine Learning MLOps Model Deployment Blue-Green Deployment Python

Production ML deployments demand reliability. Azure Machine Learning managed endpoints provide enterprise-grade model serving with built-in traffic splitting, autoscaling, and zero-downtime deployments. Here’s how to implement robust deployment pipelines.

Creating Managed Endpoints

Set up an endpoint that supports multiple model versions:

from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration
)
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="your-sub",
    resource_group_name="your-rg",
    workspace_name="your-workspace"
)

# Create the endpoint
endpoint = ManagedOnlineEndpoint(
    name="fraud-detection-endpoint",
    description="Real-time fraud detection model",
    auth_mode="key",
    tags={"team": "risk-analytics", "env": "production"}
)

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Create initial deployment
blue_deployment = ManagedOnlineDeployment(
    name="blue",
    endpoint_name="fraud-detection-endpoint",
    model=Model(path="./models/fraud_model_v1"),
    environment=Environment(
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
        conda_file="./environments/inference_env.yml"
    ),
    code_configuration=CodeConfiguration(
        code="./src",
        scoring_script="score.py"
    ),
    instance_type="Standard_DS3_v2",
    instance_count=3,
    request_settings={
        "request_timeout_ms": 3000,
        "max_concurrent_requests_per_instance": 100
    },
    liveness_probe={"initial_delay": 30, "period": 10},
    readiness_probe={"initial_delay": 30, "period": 10}
)

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

Blue-Green Deployment Pattern

Deploy new versions without downtime using traffic splitting:

# Deploy new version as green
green_deployment = ManagedOnlineDeployment(
    name="green",
    endpoint_name="fraud-detection-endpoint",
    model=Model(path="./models/fraud_model_v2"),
    # ... same configuration
)

ml_client.online_deployments.begin_create_or_update(green_deployment).result()

# Gradually shift traffic
endpoint.traffic = {"blue": 90, "green": 10}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Monitor metrics, then complete rollout
endpoint.traffic = {"blue": 0, "green": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Clean up old deployment
ml_client.online_deployments.begin_delete(
    name="blue",
    endpoint_name="fraud-detection-endpoint"
).result()

Monitoring and Rollback

Configure Azure Monitor alerts on latency and error rates. Implement automatic rollback triggers when thresholds are breached.