Back to Blog
3 min read

Azure AI Model Catalog: March 2024 Updates

Azure AI Model Catalog: March 2024 Updates

The Azure AI Model Catalog continues to expand with new models and capabilities. This month brings significant updates including new foundation models and improved deployment options.

What’s New in March 2024

  • Mistral Large: Now available through Azure AI
  • Cohere Command R: Enhanced retrieval-augmented generation
  • Meta Llama 2 70B: Optimized for Azure infrastructure
  • New deployment tiers: More flexible scaling options

Exploring the Model Catalog

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Initialize client
credential = DefaultAzureCredential()
ml_client = MLClient(
    credential=credential,
    subscription_id="your-subscription-id",
    resource_group="your-resource-group",
    workspace_name="your-workspace"
)

# List available models
models = ml_client.models.list()
for model in models:
    print(f"Model: {model.name}")
    print(f"  Version: {model.version}")
    print(f"  Description: {model.description}")
    print("---")

Deploying Models from the Catalog

Using Azure CLI

# List available foundation models
az ml model list --registry-name azureml

# Deploy a model
az ml online-deployment create \
    --name mistral-large-deployment \
    --endpoint-name my-endpoint \
    --model azureml://registries/azureml/models/Mistral-large/versions/1 \
    --instance-type Standard_NC24ads_A100_v4 \
    --instance-count 1

Using Python SDK

from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model
)

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="foundation-model-endpoint",
    description="Endpoint for foundation models",
    auth_mode="key"
)

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Deploy model
deployment = ManagedOnlineDeployment(
    name="mistral-large",
    endpoint_name="foundation-model-endpoint",
    model="azureml://registries/azureml/models/Mistral-large/versions/1",
    instance_type="Standard_NC24ads_A100_v4",
    instance_count=1
)

ml_client.online_deployments.begin_create_or_update(deployment).result()

Model Comparison

ModelParametersContextBest For
Mistral Large70B+32KComplex reasoning
Llama 2 70B70B4KGeneral tasks
Cohere Command R-128KRAG applications
Phi-22.7B2KEdge deployment

Serverless API Access

Many models are now available through serverless APIs:

import requests
import json

# Using serverless deployment
endpoint_url = "https://your-endpoint.inference.ai.azure.com/v1/chat/completions"
api_key = "your-api-key"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning in simple terms."}
    ],
    "max_tokens": 500,
    "temperature": 0.7
}

response = requests.post(endpoint_url, headers=headers, json=payload)
result = response.json()
print(result["choices"][0]["message"]["content"])

Cost Optimization

from azure.ai.ml.entities import ServerlessEndpoint

# Use serverless for variable workloads
serverless_endpoint = ServerlessEndpoint(
    name="mistral-serverless",
    model_id="azureml://registries/azureml/models/Mistral-large/versions/1"
)

# Pay-per-token pricing
# No infrastructure management
# Automatic scaling

Monitoring Model Performance

from azure.monitor.query import LogsQueryClient
from datetime import timedelta

logs_client = LogsQueryClient(credential)

query = """
AmlOnlineEndpointConsoleLog
| where TimeGenerated > ago(1h)
| where Message contains "latency"
| summarize avg(todouble(extract("latency=([0-9.]+)", 1, Message))) by bin(TimeGenerated, 5m)
"""

response = logs_client.query_workspace(
    workspace_id="your-workspace-id",
    query=query,
    timespan=timedelta(hours=1)
)

for row in response.tables[0].rows:
    print(f"Time: {row[0]}, Avg Latency: {row[1]}ms")

Best Practices

  1. Start with serverless: Use for experimentation and variable workloads
  2. Move to dedicated: When you have predictable, high-volume traffic
  3. Monitor costs: Set up alerts for unexpected usage
  4. Use model evaluation: Test before production deployment

Conclusion

The Azure AI Model Catalog provides a comprehensive platform for accessing and deploying foundation models. The combination of serverless and dedicated options gives flexibility for any workload.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.