Back to Blog
5 min read

Prompt Flow General Availability: Building Production LLM Applications

Prompt Flow has reached general availability in Azure Machine Learning, providing a robust framework for building, testing, and deploying LLM-powered applications. Today, I will dive deep into production patterns with Prompt Flow.

Why Prompt Flow?

Prompt Flow addresses key challenges in LLM application development:

  • Prompt Engineering: Iterative prompt development with versioning
  • Orchestration: Chain multiple LLM calls and tools
  • Evaluation: Systematic testing of LLM outputs
  • Deployment: Production-ready endpoints with monitoring
┌─────────────────────────────────────────────────────┐
│              LLM Application Lifecycle               │
├─────────────────────────────────────────────────────┤
│                                                      │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│  │  Build   │──▶│  Test    │──▶│  Deploy  │        │
│  │  (Flow)  │   │  (Eval)  │   │(Endpoint)│        │
│  └──────────┘   └──────────┘   └──────────┘        │
│       │              │              │               │
│       ▼              ▼              ▼               │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│  │ Prompt   │   │ Metrics  │   │ Monitor  │        │
│  │ Variants │   │ Compare  │   │ & Scale  │        │
│  └──────────┘   └──────────┘   └──────────┘        │
│                                                      │
└─────────────────────────────────────────────────────┘

Building a RAG Application

Project Structure

my-rag-flow/
├── flow.dag.yaml         # Flow definition
├── requirements.txt      # Python dependencies
├── embed_query.py        # Embedding node
├── search_index.py       # Search node
├── generate_answer.py    # LLM node
├── chat_prompt.jinja2    # Prompt template
├── connections/
│   └── connection.yaml   # Connection configs
├── data/
│   ├── test.jsonl        # Test dataset
│   └── eval.jsonl        # Evaluation dataset
└── evaluation/
    └── eval_flow.dag.yaml

Flow Definition

# flow.dag.yaml
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json

environment:
  python_requirements_txt: requirements.txt

inputs:
  chat_history:
    type: list
    default: []
  question:
    type: string

outputs:
  answer:
    type: string
    reference: ${generate_answer.output}
  context:
    type: string
    reference: ${search_index.output}

nodes:
  - name: embed_query
    type: python
    source:
      type: code
      path: embed_query.py
    inputs:
      question: ${inputs.question}
    connection: azure_openai

  - name: search_index
    type: python
    source:
      type: code
      path: search_index.py
    inputs:
      query_embedding: ${embed_query.output}
      top_k: 5
    connection: cognitive_search

  - name: generate_answer
    type: llm
    source:
      type: code
      path: chat_prompt.jinja2
    inputs:
      deployment_name: gpt-4
      temperature: 0.7
      max_tokens: 500
      context: ${search_index.output}
      chat_history: ${inputs.chat_history}
      question: ${inputs.question}
    connection: azure_openai
    api: chat

Node Implementations

# embed_query.py
from promptflow import tool
from promptflow.connections import AzureOpenAIConnection

@tool
def embed_query(question: str, connection: AzureOpenAIConnection) -> list:
    """Generate embedding for the query"""
    from openai import AzureOpenAI

    client = AzureOpenAI(
        api_key=connection.api_key,
        api_version="2023-05-15",
        azure_endpoint=connection.api_base
    )

    response = client.embeddings.create(
        model="text-embedding-ada-002",
        input=question
    )

    return response.data[0].embedding
# search_index.py
from promptflow import tool
from promptflow.connections import CognitiveSearchConnection
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

@tool
def search_index(
    query_embedding: list,
    top_k: int,
    connection: CognitiveSearchConnection
) -> str:
    """Search for relevant documents"""

    search_client = SearchClient(
        endpoint=connection.api_base,
        index_name="documents",
        credential=AzureKeyCredential(connection.api_key)
    )

    results = search_client.search(
        search_text=None,
        vector_queries=[{
            "kind": "vector",
            "vector": query_embedding,
            "k_nearest_neighbors": top_k,
            "fields": "embedding"
        }],
        select=["title", "content", "source"]
    )

    documents = []
    for result in results:
        documents.append(
            f"Source: {result['source']}\n"
            f"Title: {result['title']}\n"
            f"Content: {result['content']}"
        )

    return "\n\n---\n\n".join(documents)
{# chat_prompt.jinja2 #}
system:
You are a helpful AI assistant that answers questions based on the provided context.

Instructions:
- Only use information from the provided context
- If the answer is not in the context, say "I don't have information about that"
- Cite sources when possible
- Be concise but thorough

Context:
{{context}}

{% for message in chat_history %}
{{message.role}}:
{{message.content}}
{% endfor %}

user:
{{question}}

Running Flows Locally

from promptflow import PFClient

pf = PFClient()

# Test single input
result = pf.test(
    flow="./my-rag-flow",
    inputs={
        "question": "What is Azure AI Studio?",
        "chat_history": []
    }
)
print(result["answer"])

# Run batch
run = pf.run(
    flow="./my-rag-flow",
    data="./data/test.jsonl",
    name="test-run-001"
)

# Get results
details = pf.get_details(run)
print(details)

Evaluation Flow

# evaluation/eval_flow.dag.yaml
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json

inputs:
  question:
    type: string
  answer:
    type: string
  context:
    type: string
  ground_truth:
    type: string

outputs:
  relevance_score:
    type: string
    reference: ${relevance_eval.output}
  groundedness_score:
    type: string
    reference: ${groundedness_eval.output}

nodes:
  - name: relevance_eval
    type: llm
    source:
      type: code
      path: relevance_prompt.jinja2
    inputs:
      deployment_name: gpt-4
      question: ${inputs.question}
      answer: ${inputs.answer}
    connection: azure_openai
    api: chat

  - name: groundedness_eval
    type: llm
    source:
      type: code
      path: groundedness_prompt.jinja2
    inputs:
      deployment_name: gpt-4
      context: ${inputs.context}
      answer: ${inputs.answer}
    connection: azure_openai
    api: chat
{# relevance_prompt.jinja2 #}
system:
You are an AI assistant that evaluates the relevance of an answer to a question.
Score the answer from 1-5 where:
1 = Completely irrelevant
2 = Mostly irrelevant
3 = Partially relevant
4 = Mostly relevant
5 = Highly relevant

Output only the numeric score.

user:
Question: {{question}}
Answer: {{answer}}

Score:

Running Evaluations

# Run evaluation
eval_run = pf.run(
    flow="./evaluation/eval_flow",
    data="./data/eval.jsonl",
    run=run,  # Reference the main flow run
    column_mapping={
        "question": "${data.question}",
        "answer": "${run.outputs.answer}",
        "context": "${run.outputs.context}",
        "ground_truth": "${data.ground_truth}"
    },
    name="eval-run-001"
)

# Analyze metrics
metrics = pf.get_metrics(eval_run)
print(f"Average Relevance: {metrics['relevance_score']:.2f}")
print(f"Average Groundedness: {metrics['groundedness_score']:.2f}")

Deployment

from azure.ai.ml import MLClient
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    Model
)
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="your-sub",
    resource_group_name="your-rg",
    workspace_name="your-workspace"
)

# Create endpoint
endpoint = ManagedOnlineEndpoint(
    name="rag-endpoint",
    auth_mode="key"
)
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

# Register flow as model
model = Model(
    path="./my-rag-flow",
    name="rag-model",
    type="promptflow"
)
ml_client.models.create_or_update(model)

# Create deployment
deployment = ManagedOnlineDeployment(
    name="rag-deployment",
    endpoint_name="rag-endpoint",
    model=model,
    instance_type="Standard_DS3_v2",
    instance_count=1
)
ml_client.online_deployments.begin_create_or_update(deployment).result()

# Set traffic
endpoint.traffic = {"rag-deployment": 100}
ml_client.online_endpoints.begin_create_or_update(endpoint).result()

Calling Deployed Flow

import urllib.request
import json

def call_rag_endpoint(question: str, chat_history: list = None):
    url = "https://rag-endpoint.eastus.inference.ml.azure.com/score"

    data = {
        "question": question,
        "chat_history": chat_history or []
    }

    body = json.dumps(data).encode("utf-8")

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }

    req = urllib.request.Request(url, body, headers)

    with urllib.request.urlopen(req) as response:
        result = json.loads(response.read())

    return result

# Usage
response = call_rag_endpoint("What is Azure AI Studio?")
print(response["answer"])

Prompt Flow provides the foundation for building production-grade LLM applications. Tomorrow, I will cover Azure Machine Learning updates from Build 2023.

Resources

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.