Back to Blog
5 min read

Azure AI Studio Updates at Build 2024

Azure AI Studio received significant updates at Build 2024. Let’s explore what’s new and how to leverage these capabilities.

What’s New in Azure AI Studio

1. Unified Experience

Azure AI Studio now provides a single pane of glass for:

  • Model catalog browsing
  • Prompt engineering
  • Fine-tuning
  • Evaluation
  • Deployment
  • Monitoring

2. Model Catalog Expansion

Available Models:
├── OpenAI (GPT-4o, GPT-4, GPT-3.5)
├── Microsoft (Phi-3 family)
├── Meta (Llama 3)
├── Mistral (Mistral, Mixtral)
├── Cohere (Command R+)
└── Stability AI (SDXL)

Getting Started with AI Studio

Project Setup

from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# Connect to AI Studio project
credential = DefaultAzureCredential()
client = AIProjectClient(
    credential=credential,
    subscription_id="your-subscription-id",
    resource_group_name="your-resource-group",
    project_name="your-ai-project"
)

# List available models
models = client.models.list()
for model in models:
    print(f"{model.name}: {model.version}")

Model Deployment

from azure.ai.projects.models import ModelDeployment

# Deploy a model
deployment = client.deployments.create(
    name="gpt4o-production",
    model="gpt-4o",
    sku="Standard",
    capacity=10  # TPM in thousands
)

print(f"Deployment created: {deployment.name}")
print(f"Endpoint: {deployment.endpoint}")

Prompt Flow Updates

Visual Flow Builder

# prompt_flow.yaml
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
inputs:
  query:
    type: string
    description: User question
  context:
    type: string
    description: Retrieved documents

nodes:
  - name: search
    type: python
    source:
      type: code
      path: search.py
    inputs:
      query: ${inputs.query}

  - name: generate
    type: llm
    source:
      type: code
      path: generate.prompty
    inputs:
      context: ${search.output}
      query: ${inputs.query}

outputs:
  answer:
    type: string
    reference: ${generate.output}

Prompty Files

# generate.prompty
---
name: Answer Generator
description: Generate answers based on context
model:
  api: chat
  configuration:
    type: azure_openai
    deployment: gpt4o-production
  parameters:
    temperature: 0.7
    max_tokens: 500
inputs:
  context:
    type: string
  query:
    type: string
---
system:
You are a helpful assistant. Use the provided context to answer questions accurately.

user:
Context: {{context}}

Question: {{query}}

Answer:

Running Prompt Flows

from promptflow import PFClient

pf = PFClient()

# Run locally
result = pf.run(
    flow="./my_flow",
    data="./test_data.jsonl",
    column_mapping={
        "query": "${data.question}",
        "context": "${data.context}"
    }
)

# View results
details = pf.get_details(result)
print(details)

Evaluation Framework

Built-in Evaluators

from azure.ai.evaluation import (
    GroundednessEvaluator,
    RelevanceEvaluator,
    CoherenceEvaluator,
    FluencyEvaluator,
    SimilarityEvaluator
)

# Initialize evaluators
groundedness = GroundednessEvaluator(model_config)
relevance = RelevanceEvaluator(model_config)
coherence = CoherenceEvaluator(model_config)

# Evaluate a response
result = groundedness(
    question="What is Azure AI Studio?",
    answer="Azure AI Studio is a platform for building AI applications.",
    context="Azure AI Studio provides tools for AI development..."
)

print(f"Groundedness score: {result['groundedness']}")

Custom Evaluators

from azure.ai.evaluation import Evaluator

class CustomEvaluator(Evaluator):
    def __init__(self, threshold: float = 0.5):
        self.threshold = threshold

    def evaluate(self, question: str, answer: str, **kwargs) -> dict:
        # Custom evaluation logic
        score = self._calculate_score(question, answer)

        return {
            "custom_score": score,
            "passed": score >= self.threshold
        }

    def _calculate_score(self, question: str, answer: str) -> float:
        # Your scoring logic
        return 0.8

# Use in batch evaluation
evaluator = CustomEvaluator(threshold=0.7)
results = evaluator.evaluate_batch(test_data)

Batch Evaluation

from azure.ai.evaluation import evaluate

# Run evaluation on dataset
results = evaluate(
    data="./eval_data.jsonl",
    evaluators={
        "groundedness": GroundednessEvaluator(model_config),
        "relevance": RelevanceEvaluator(model_config),
        "coherence": CoherenceEvaluator(model_config)
    },
    evaluator_config={
        "groundedness": {
            "question": "${data.question}",
            "answer": "${target.answer}",
            "context": "${data.context}"
        }
    }
)

# Analyze results
print(f"Average groundedness: {results.metrics['groundedness.mean']:.2f}")
print(f"Average relevance: {results.metrics['relevance.mean']:.2f}")

Fine-Tuning Models

from azure.ai.projects.models import FineTuningJob

# Prepare training data
training_data = [
    {"messages": [
        {"role": "system", "content": "You are a technical assistant."},
        {"role": "user", "content": "Explain APIs"},
        {"role": "assistant", "content": "APIs are..."}
    ]},
    # More examples...
]

# Upload training file
file = client.files.upload(
    file=training_data,
    purpose="fine-tune"
)

# Create fine-tuning job
job = client.fine_tuning.create(
    model="gpt-4o-mini",
    training_file=file.id,
    hyperparameters={
        "n_epochs": 3,
        "batch_size": 4,
        "learning_rate_multiplier": 0.1
    }
)

# Monitor progress
while job.status not in ["succeeded", "failed"]:
    job = client.fine_tuning.get(job.id)
    print(f"Status: {job.status}")
    time.sleep(60)

Tracing and Debugging

from azure.ai.inference.tracing import AIInferenceInstrumentor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter

# Setup tracing
trace.set_tracer_provider(TracerProvider())
tracer_provider = trace.get_tracer_provider()

# Export to Azure Monitor
exporter = AzureMonitorTraceExporter(
    connection_string=os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"]
)
tracer_provider.add_span_processor(BatchSpanProcessor(exporter))

# Enable AI inference tracing
AIInferenceInstrumentor().instrument()

# Now all AI calls are automatically traced
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
# Trace data sent to Application Insights

Safety Evaluations

from azure.ai.evaluation import (
    ContentSafetyEvaluator,
    ViolenceEvaluator,
    SexualEvaluator,
    HateUnfairnessEvaluator,
    SelfHarmEvaluator
)

# Evaluate content safety
safety_evaluator = ContentSafetyEvaluator(
    azure_ai_project=project_config
)

result = safety_evaluator(
    question="User input...",
    answer="Model response..."
)

print(f"Violence score: {result['violence']}")
print(f"Sexual score: {result['sexual']}")
print(f"Hate score: {result['hate_unfairness']}")
print(f"Self-harm score: {result['self_harm']}")

CI/CD Integration

# azure-pipelines.yml
trigger:
  - main

stages:
  - stage: Evaluate
    jobs:
      - job: RunEvaluation
        steps:
          - task: UsePythonVersion@0
            inputs:
              versionSpec: '3.11'

          - script: |
              pip install azure-ai-evaluation promptflow
            displayName: Install dependencies

          - script: |
              python -m promptflow run \
                --flow ./flows/rag_flow \
                --data ./eval/test_data.jsonl \
                --output ./results
            displayName: Run prompt flow

          - script: |
              python evaluate.py \
                --results ./results \
                --threshold 0.7
            displayName: Run evaluation

          - task: PublishTestResults@2
            inputs:
              testResultsFiles: '**/eval_results.xml'
              testRunTitle: 'AI Evaluation Results'

  - stage: Deploy
    dependsOn: Evaluate
    condition: succeeded()
    jobs:
      - deployment: DeployModel
        environment: production
        strategy:
          runOnce:
            deploy:
              steps:
                - script: |
                    az ml online-deployment create \
                      --name production \
                      --endpoint my-endpoint \
                      --file deployment.yml

Best Practices

  1. Version your prompts - Use Prompty files in source control
  2. Evaluate continuously - Run evaluations on every change
  3. Enable tracing - Debug issues in production
  4. Monitor safety - Use content safety evaluators
  5. Automate deployments - CI/CD for AI applications

What’s Next

Tomorrow I’ll dive deeper into Prompt Flow improvements.

Resources

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.