March 5, 2024 2 min read

Azure AI Model Catalog: March 2024 Updates

Azure AI Model Catalog Machine Learning LLM

Azure AI Model Catalog: March 2024 Updates

The Azure AI Model Catalog continues to expand with new models and capabilities. This month brings significant updates including new foundation models and improved deployment options.

What’s New in March 2024

Mistral Large: Now available through Azure AI
Cohere Command R: Enhanced retrieval-augmented generation
Meta Llama 2 70B: Optimized for Azure infrastructure
New deployment tiers: More flexible scaling options

Exploring the Model Catalog

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

# Initialize client
credential = DefaultAzureCredential()
ml_client = MLClient(
    credential=credential,
    subscription_id="your-subscription-id",
    resource_group="your-resource-group",
    workspace_name="your-workspace"
)

# List available models
models = ml_client.models.list()
for model in models:
    print(f"Model: {model.name}")
    print(f"  Version: {model.version}")
    print(f"  Description: {model.description}")
    print("---")

Deploying Models from the Catalog

Using Azure CLI

# List available foundation models
az ml model list --registry-name azureml

# Deploy a model
az ml online-deployment create \
    --name mistral-large-deployment \
    --endpoint-name my-endpoint \
    --model azureml://registries/azureml/models/Mistral-large/versions/1 \
    --instance-type Standard_NC24ads_A100_v4 \
    --instance-count 1

Using Python SDK

from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model
)

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="foundation-model-endpoint",
    description="Endpoint for foundation models",
    auth_mode="key"
)

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Deploy model
deployment = ManagedOnlineDeployment(
    name="mistral-large",
    endpoint_name="foundation-model-endpoint",
    model="azureml://registries/azureml/models/Mistral-large/versions/1",
    instance_type="Standard_NC24ads_A100_v4",
    instance_count=1
)

ml_client.online_deployments.begin_create_or_update(deployment).result()

Model Comparison

Model	Parameters	Context	Best For
Mistral Large	70B+	32K	Complex reasoning
Llama 2 70B	70B	4K	General tasks
Cohere Command R	-	128K	RAG applications
Phi-2	2.7B	2K	Edge deployment

Serverless API Access

Many models are now available through serverless APIs:

import requests
import json

# Using serverless deployment
endpoint_url = "https://your-endpoint.inference.ai.azure.com/v1/chat/completions"
api_key = "your-api-key"

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

payload = {
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain machine learning in simple terms."}
    ],
    "max_tokens": 500,
    "temperature": 0.7
}

response = requests.post(endpoint_url, headers=headers, json=payload)
result = response.json()
print(result["choices"][0]["message"]["content"])

Cost Optimization

from azure.ai.ml.entities import ServerlessEndpoint

# Use serverless for variable workloads
serverless_endpoint = ServerlessEndpoint(
    name="mistral-serverless",
    model_id="azureml://registries/azureml/models/Mistral-large/versions/1"
)

# Pay-per-token pricing
# No infrastructure management
# Automatic scaling

Monitoring Model Performance

from azure.monitor.query import LogsQueryClient
from datetime import timedelta

logs_client = LogsQueryClient(credential)

query = """
AmlOnlineEndpointConsoleLog
| where TimeGenerated > ago(1h)
| where Message contains "latency"
| summarize avg(todouble(extract("latency=([0-9.]+)", 1, Message))) by bin(TimeGenerated, 5m)
"""

response = logs_client.query_workspace(
    workspace_id="your-workspace-id",
    query=query,
    timespan=timedelta(hours=1)
)

for row in response.tables[0].rows:
    print(f"Time: {row[0]}, Avg Latency: {row[1]}ms")

Best Practices

Start with serverless: Use for experimentation and variable workloads
Move to dedicated: When you have predictable, high-volume traffic
Monitor costs: Set up alerts for unexpected usage
Use model evaluation: Test before production deployment

Conclusion

The Azure AI Model Catalog provides a comprehensive platform for accessing and deploying foundation models. The combination of serverless and dedicated options gives flexibility for any workload.