3 min read
Mistral Large on Azure: Getting Started Guide
Mistral Large on Azure: Getting Started Guide
Mistral Large is now available on Azure AI, bringing one of Europe’s most capable AI models to the Azure ecosystem. This guide covers deployment, usage, and best practices.
Why Mistral Large?
Mistral Large offers:
- Strong multilingual support: Excellent in European languages
- 32K context window: Process longer documents
- Cost-effective: Competitive pricing for enterprise workloads
- Function calling: Native tool use capabilities
Deployment Options
Option 1: Serverless API (Recommended for Getting Started)
# Using Azure CLI
az ml serverless-endpoint create \
--name mistral-large-endpoint \
--model-id azureml://registries/azureml-mistral/models/Mistral-large/versions/1 \
--resource-group your-rg \
--workspace-name your-workspace
Option 2: Managed Compute
from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
ManagedOnlineEndpoint,
ManagedOnlineDeployment
)
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
ml_client = MLClient(
credential=credential,
subscription_id="your-sub-id",
resource_group="your-rg",
workspace_name="your-workspace"
)
# Create endpoint
endpoint = ManagedOnlineEndpoint(
name="mistral-large-managed",
auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint).result()
# Create deployment
deployment = ManagedOnlineDeployment(
name="mistral-deployment",
endpoint_name="mistral-large-managed",
model="azureml://registries/azureml-mistral/models/Mistral-large/versions/1",
instance_type="Standard_NC24ads_A100_v4",
instance_count=1
)
ml_client.online_deployments.begin_create_or_update(deployment).result()
Using the API
Basic Completion
import requests
endpoint_url = "https://your-endpoint.inference.ai.azure.com"
api_key = "your-api-key"
def chat_with_mistral(messages: list, max_tokens: int = 1024) -> str:
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"messages": messages,
"max_tokens": max_tokens,
"temperature": 0.7
}
response = requests.post(
f"{endpoint_url}/v1/chat/completions",
headers=headers,
json=payload
)
return response.json()["choices"][0]["message"]["content"]
# Example usage
response = chat_with_mistral([
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain cloud computing in French."}
])
print(response)
Function Calling
def mistral_function_call(messages: list, tools: list):
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"messages": messages,
"tools": tools,
"tool_choice": "auto",
"max_tokens": 1024
}
response = requests.post(
f"{endpoint_url}/v1/chat/completions",
headers=headers,
json=payload
)
return response.json()
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["city"]
}
}
}
]
messages = [
{"role": "user", "content": "What's the weather like in Paris?"}
]
result = mistral_function_call(messages, tools)
print(result)
Streaming Responses
import requests
import json
def stream_mistral_response(messages: list):
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"messages": messages,
"max_tokens": 1024,
"stream": True
}
with requests.post(
f"{endpoint_url}/v1/chat/completions",
headers=headers,
json=payload,
stream=True
) as response:
for line in response.iter_lines():
if line:
line_text = line.decode('utf-8')
if line_text.startswith('data: '):
data = line_text[6:]
if data != '[DONE]':
chunk = json.loads(data)
content = chunk['choices'][0]['delta'].get('content', '')
if content:
print(content, end='', flush=True)
# Usage
stream_mistral_response([
{"role": "user", "content": "Write a poem about Azure cloud."}
])
Integration with LangChain
from langchain_community.chat_models import AzureMLChatOnlineEndpoint
from langchain.schema import HumanMessage, SystemMessage
chat = AzureMLChatOnlineEndpoint(
endpoint_url="https://your-endpoint.inference.ai.azure.com/v1/chat/completions",
endpoint_api_key="your-api-key"
)
messages = [
SystemMessage(content="You are a technical writer."),
HumanMessage(content="Write documentation for a REST API.")
]
response = chat(messages)
print(response.content)
Cost Comparison
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Mistral Large | $4.00 | $12.00 |
| GPT-4 Turbo | $10.00 | $30.00 |
| Claude 3 Sonnet | $3.00 | $15.00 |
Best Practices
- Use for multilingual tasks: Mistral excels at European languages
- Leverage function calling: Great for agentic applications
- Monitor latency: Track p99 latency for production
- Set up alerts: Use Azure Monitor for anomaly detection
Conclusion
Mistral Large on Azure provides a powerful, cost-effective option for enterprise AI workloads. The serverless deployment option makes it easy to get started without infrastructure management.