5 min read
Azure AI Studio Updates at Build 2024
Azure AI Studio received significant updates at Build 2024. Let’s explore what’s new and how to leverage these capabilities.
What’s New in Azure AI Studio
1. Unified Experience
Azure AI Studio now provides a single pane of glass for:
- Model catalog browsing
- Prompt engineering
- Fine-tuning
- Evaluation
- Deployment
- Monitoring
2. Model Catalog Expansion
Available Models:
├── OpenAI (GPT-4o, GPT-4, GPT-3.5)
├── Microsoft (Phi-3 family)
├── Meta (Llama 3)
├── Mistral (Mistral, Mixtral)
├── Cohere (Command R+)
└── Stability AI (SDXL)
Getting Started with AI Studio
Project Setup
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
# Connect to AI Studio project
credential = DefaultAzureCredential()
client = AIProjectClient(
credential=credential,
subscription_id="your-subscription-id",
resource_group_name="your-resource-group",
project_name="your-ai-project"
)
# List available models
models = client.models.list()
for model in models:
print(f"{model.name}: {model.version}")
Model Deployment
from azure.ai.projects.models import ModelDeployment
# Deploy a model
deployment = client.deployments.create(
name="gpt4o-production",
model="gpt-4o",
sku="Standard",
capacity=10 # TPM in thousands
)
print(f"Deployment created: {deployment.name}")
print(f"Endpoint: {deployment.endpoint}")
Prompt Flow Updates
Visual Flow Builder
# prompt_flow.yaml
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
inputs:
query:
type: string
description: User question
context:
type: string
description: Retrieved documents
nodes:
- name: search
type: python
source:
type: code
path: search.py
inputs:
query: ${inputs.query}
- name: generate
type: llm
source:
type: code
path: generate.prompty
inputs:
context: ${search.output}
query: ${inputs.query}
outputs:
answer:
type: string
reference: ${generate.output}
Prompty Files
# generate.prompty
---
name: Answer Generator
description: Generate answers based on context
model:
api: chat
configuration:
type: azure_openai
deployment: gpt4o-production
parameters:
temperature: 0.7
max_tokens: 500
inputs:
context:
type: string
query:
type: string
---
system:
You are a helpful assistant. Use the provided context to answer questions accurately.
user:
Context: {{context}}
Question: {{query}}
Answer:
Running Prompt Flows
from promptflow import PFClient
pf = PFClient()
# Run locally
result = pf.run(
flow="./my_flow",
data="./test_data.jsonl",
column_mapping={
"query": "${data.question}",
"context": "${data.context}"
}
)
# View results
details = pf.get_details(result)
print(details)
Evaluation Framework
Built-in Evaluators
from azure.ai.evaluation import (
GroundednessEvaluator,
RelevanceEvaluator,
CoherenceEvaluator,
FluencyEvaluator,
SimilarityEvaluator
)
# Initialize evaluators
groundedness = GroundednessEvaluator(model_config)
relevance = RelevanceEvaluator(model_config)
coherence = CoherenceEvaluator(model_config)
# Evaluate a response
result = groundedness(
question="What is Azure AI Studio?",
answer="Azure AI Studio is a platform for building AI applications.",
context="Azure AI Studio provides tools for AI development..."
)
print(f"Groundedness score: {result['groundedness']}")
Custom Evaluators
from azure.ai.evaluation import Evaluator
class CustomEvaluator(Evaluator):
def __init__(self, threshold: float = 0.5):
self.threshold = threshold
def evaluate(self, question: str, answer: str, **kwargs) -> dict:
# Custom evaluation logic
score = self._calculate_score(question, answer)
return {
"custom_score": score,
"passed": score >= self.threshold
}
def _calculate_score(self, question: str, answer: str) -> float:
# Your scoring logic
return 0.8
# Use in batch evaluation
evaluator = CustomEvaluator(threshold=0.7)
results = evaluator.evaluate_batch(test_data)
Batch Evaluation
from azure.ai.evaluation import evaluate
# Run evaluation on dataset
results = evaluate(
data="./eval_data.jsonl",
evaluators={
"groundedness": GroundednessEvaluator(model_config),
"relevance": RelevanceEvaluator(model_config),
"coherence": CoherenceEvaluator(model_config)
},
evaluator_config={
"groundedness": {
"question": "${data.question}",
"answer": "${target.answer}",
"context": "${data.context}"
}
}
)
# Analyze results
print(f"Average groundedness: {results.metrics['groundedness.mean']:.2f}")
print(f"Average relevance: {results.metrics['relevance.mean']:.2f}")
Fine-Tuning Models
from azure.ai.projects.models import FineTuningJob
# Prepare training data
training_data = [
{"messages": [
{"role": "system", "content": "You are a technical assistant."},
{"role": "user", "content": "Explain APIs"},
{"role": "assistant", "content": "APIs are..."}
]},
# More examples...
]
# Upload training file
file = client.files.upload(
file=training_data,
purpose="fine-tune"
)
# Create fine-tuning job
job = client.fine_tuning.create(
model="gpt-4o-mini",
training_file=file.id,
hyperparameters={
"n_epochs": 3,
"batch_size": 4,
"learning_rate_multiplier": 0.1
}
)
# Monitor progress
while job.status not in ["succeeded", "failed"]:
job = client.fine_tuning.get(job.id)
print(f"Status: {job.status}")
time.sleep(60)
Tracing and Debugging
from azure.ai.inference.tracing import AIInferenceInstrumentor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter
# Setup tracing
trace.set_tracer_provider(TracerProvider())
tracer_provider = trace.get_tracer_provider()
# Export to Azure Monitor
exporter = AzureMonitorTraceExporter(
connection_string=os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"]
)
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))
# Enable AI inference tracing
AIInferenceInstrumentor().instrument()
# Now all AI calls are automatically traced
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
# Trace data sent to Application Insights
Safety Evaluations
from azure.ai.evaluation import (
ContentSafetyEvaluator,
ViolenceEvaluator,
SexualEvaluator,
HateUnfairnessEvaluator,
SelfHarmEvaluator
)
# Evaluate content safety
safety_evaluator = ContentSafetyEvaluator(
azure_ai_project=project_config
)
result = safety_evaluator(
question="User input...",
answer="Model response..."
)
print(f"Violence score: {result['violence']}")
print(f"Sexual score: {result['sexual']}")
print(f"Hate score: {result['hate_unfairness']}")
print(f"Self-harm score: {result['self_harm']}")
CI/CD Integration
# azure-pipelines.yml
trigger:
- main
stages:
- stage: Evaluate
jobs:
- job: RunEvaluation
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '3.11'
- script: |
pip install azure-ai-evaluation promptflow
displayName: Install dependencies
- script: |
python -m promptflow run \
--flow ./flows/rag_flow \
--data ./eval/test_data.jsonl \
--output ./results
displayName: Run prompt flow
- script: |
python evaluate.py \
--results ./results \
--threshold 0.7
displayName: Run evaluation
- task: PublishTestResults@2
inputs:
testResultsFiles: '**/eval_results.xml'
testRunTitle: 'AI Evaluation Results'
- stage: Deploy
dependsOn: Evaluate
condition: succeeded()
jobs:
- deployment: DeployModel
environment: production
strategy:
runOnce:
deploy:
steps:
- script: |
az ml online-deployment create \
--name production \
--endpoint my-endpoint \
--file deployment.yml
Best Practices
- Version your prompts - Use Prompty files in source control
- Evaluate continuously - Run evaluations on every change
- Enable tracing - Debug issues in production
- Monitor safety - Use content safety evaluators
- Automate deployments - CI/CD for AI applications
What’s Next
Tomorrow I’ll dive deeper into Prompt Flow improvements.