Back to Blog
2 min read

Cost Optimization Strategies for Azure OpenAI Workloads

AI workloads can quickly become expensive without proper cost management. Azure OpenAI pricing is based on tokens, so optimizing token usage directly reduces costs. Here are proven strategies for keeping AI costs under control.

Implement Prompt Caching

Semantic caching avoids redundant API calls for similar queries.

import hashlib
from azure.cosmos import CosmosClient
from openai import AzureOpenAI

class CachedAIClient:
    def __init__(self, openai_client: AzureOpenAI, cosmos_container):
        self.ai = openai_client
        self.cache = cosmos_container
        self.cache_ttl_hours = 24

    def _hash_prompt(self, messages: list) -> str:
        content = str(messages)
        return hashlib.sha256(content.encode()).hexdigest()

    def chat_completion(self, messages: list, model: str = "gpt-4o"):
        cache_key = self._hash_prompt(messages)

        # Check cache first
        try:
            cached = self.cache.read_item(item=cache_key, partition_key=cache_key)
            if cached:
                return cached["response"]
        except:
            pass

        # Call API and cache result
        response = self.ai.chat.completions.create(
            model=model,
            messages=messages
        )

        self.cache.upsert_item({
            "id": cache_key,
            "response": response.model_dump(),
            "ttl": self.cache_ttl_hours * 3600
        })

        return response

Model Selection Strategy

Use the right model for each task:

  • GPT-4o-mini: Simple classification, extraction, formatting
  • GPT-4o: Complex reasoning, creative tasks, nuanced responses
  • o1-mini: Mathematical reasoning, code analysis
  • o1-preview: Complex multi-step problems

Additional Cost Tactics

Reduce prompt length by removing unnecessary context. Use batch processing with the Batch API for non-time-sensitive workloads at 50% cost reduction. Implement token budgets per user or application. Monitor costs daily with Azure Cost Management alerts.

The combination of caching, model tiering, and monitoring typically reduces Azure OpenAI costs by 40-60% without sacrificing quality.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.