Back to Blog
2 min read

Azure API Management for AI APIs: Rate Limiting and Cost Control

AI APIs require careful management to control costs and ensure fair resource allocation. Azure API Management provides the policy framework to implement sophisticated rate limiting, quota management, and usage tracking for AI endpoints.

Configuring AI-Specific Policies

Create policies that account for token-based billing:

<policies>
    <inbound>
        <base />

        <!-- Extract token count from request -->
        <set-variable name="requestTokens" value="@{
            var body = context.Request.Body.As<JObject>();
            var messages = body["messages"] as JArray;
            int tokens = 0;
            foreach (var msg in messages) {
                tokens += msg["content"].ToString().Length / 4; // Rough estimate
            }
            return tokens;
        }" />

        <!-- Token-based rate limiting -->
        <rate-limit-by-key
            calls="1000"
            renewal-period="60"
            counter-key="@(context.Subscription.Id)"
            increment-condition="@(true)"
            increment-count="@((int)context.Variables["requestTokens"])" />

        <!-- Daily quota per subscription -->
        <quota-by-key
            calls="100000"
            bandwidth="0"
            renewal-period="86400"
            counter-key="@(context.Subscription.Id)" />

        <!-- Cost tracking header -->
        <set-header name="X-Token-Estimate" exists-action="override">
            <value>@((string)context.Variables["requestTokens"])</value>
        </set-header>
    </inbound>

    <outbound>
        <base />

        <!-- Track actual usage from response -->
        <set-variable name="responseTokens" value="@{
            var body = context.Response.Body.As<JObject>();
            return body["usage"]?["total_tokens"]?.Value<int>() ?? 0;
        }" />

        <!-- Log to Event Hub for billing -->
        <log-to-eventhub logger-id="ai-usage-logger">@{
            return new JObject(
                new JProperty("subscriptionId", context.Subscription.Id),
                new JProperty("operation", context.Operation.Id),
                new JProperty("tokensUsed", context.Variables["responseTokens"]),
                new JProperty("timestamp", DateTime.UtcNow)
            ).ToString();
        }</log-to-eventhub>
    </outbound>
</policies>

Implementing Tiered Access

Different subscription tiers get different limits:

<choose>
    <when condition="@(context.Product.Name == "Enterprise")">
        <rate-limit-by-key calls="10000" renewal-period="60"
            counter-key="@(context.Subscription.Id)" />
    </when>
    <when condition="@(context.Product.Name == "Professional")">
        <rate-limit-by-key calls="1000" renewal-period="60"
            counter-key="@(context.Subscription.Id)" />
    </when>
    <otherwise>
        <rate-limit-by-key calls="100" renewal-period="60"
            counter-key="@(context.Subscription.Id)" />
    </otherwise>
</choose>

Monitoring and Alerts

Configure Azure Monitor alerts when subscriptions approach quota limits. Provide self-service usage dashboards so customers can track their consumption and plan accordingly.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.