3 min read
Azure AI Model Catalog: March 2024 Updates
Azure AI Model Catalog: March 2024 Updates
The Azure AI Model Catalog continues to expand with new models and capabilities. This month brings significant updates including new foundation models and improved deployment options.
What’s New in March 2024
- Mistral Large: Now available through Azure AI
- Cohere Command R: Enhanced retrieval-augmented generation
- Meta Llama 2 70B: Optimized for Azure infrastructure
- New deployment tiers: More flexible scaling options
Exploring the Model Catalog
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
# Initialize client
credential = DefaultAzureCredential()
ml_client = MLClient(
credential=credential,
subscription_id="your-subscription-id",
resource_group="your-resource-group",
workspace_name="your-workspace"
)
# List available models
models = ml_client.models.list()
for model in models:
print(f"Model: {model.name}")
print(f" Version: {model.version}")
print(f" Description: {model.description}")
print("---")
Deploying Models from the Catalog
Using Azure CLI
# List available foundation models
az ml model list --registry-name azureml
# Deploy a model
az ml online-deployment create \
--name mistral-large-deployment \
--endpoint-name my-endpoint \
--model azureml://registries/azureml/models/Mistral-large/versions/1 \
--instance-type Standard_NC24ads_A100_v4 \
--instance-count 1
Using Python SDK
from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment,
Model
)
# Create endpoint
endpoint = ManagedOnlineEndpoint(
name="foundation-model-endpoint",
description="Endpoint for foundation models",
auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
# Deploy model
deployment = ManagedOnlineDeployment(
name="mistral-large",
endpoint_name="foundation-model-endpoint",
model="azureml://registries/azureml/models/Mistral-large/versions/1",
instance_type="Standard_NC24ads_A100_v4",
instance_count=1
)
ml_client.online_deployments.begin_create_or_update(deployment).result()
Model Comparison
| Model | Parameters | Context | Best For |
|---|---|---|---|
| Mistral Large | 70B+ | 32K | Complex reasoning |
| Llama 2 70B | 70B | 4K | General tasks |
| Cohere Command R | - | 128K | RAG applications |
| Phi-2 | 2.7B | 2K | Edge deployment |
Serverless API Access
Many models are now available through serverless APIs:
import requests
import json
# Using serverless deployment
endpoint_url = "https://your-endpoint.inference.ai.azure.com/v1/chat/completions"
api_key = "your-api-key"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain machine learning in simple terms."}
],
"max_tokens": 500,
"temperature": 0.7
}
response = requests.post(endpoint_url, headers=headers, json=payload)
result = response.json()
print(result["choices"][0]["message"]["content"])
Cost Optimization
from azure.ai.ml.entities import ServerlessEndpoint
# Use serverless for variable workloads
serverless_endpoint = ServerlessEndpoint(
name="mistral-serverless",
model_id="azureml://registries/azureml/models/Mistral-large/versions/1"
)
# Pay-per-token pricing
# No infrastructure management
# Automatic scaling
Monitoring Model Performance
from azure.monitor.query import LogsQueryClient
from datetime import timedelta
logs_client = LogsQueryClient(credential)
query = """
AmlOnlineEndpointConsoleLog
| where TimeGenerated > ago(1h)
| where Message contains "latency"
| summarize avg(todouble(extract("latency=([0-9.]+)", 1, Message))) by bin(TimeGenerated, 5m)
"""
response = logs_client.query_workspace(
workspace_id="your-workspace-id",
query=query,
timespan=timedelta(hours=1)
)
for row in response.tables[0].rows:
print(f"Time: {row[0]}, Avg Latency: {row[1]}ms")
Best Practices
- Start with serverless: Use for experimentation and variable workloads
- Move to dedicated: When you have predictable, high-volume traffic
- Monitor costs: Set up alerts for unexpected usage
- Use model evaluation: Test before production deployment
Conclusion
The Azure AI Model Catalog provides a comprehensive platform for accessing and deploying foundation models. The combination of serverless and dedicated options gives flexibility for any workload.