Back to Blog
6 min read

Cosmos DB Integrated Cache: Reducing RU Costs at Scale

Cosmos DB has introduced an integrated cache that sits between your application and the database. This can dramatically reduce RU consumption and latency for read-heavy workloads, without managing a separate caching layer.

The Problem It Solves

Before integrated cache, the pattern for reducing Cosmos DB costs was:

  1. Deploy Redis or another cache
  2. Implement cache-aside pattern in application code
  3. Handle cache invalidation
  4. Manage another service

This works but adds complexity. The integrated cache handles all of this at the database level.

How It Works

The integrated cache is a dedicated gateway that:

  • Caches frequently accessed items
  • Serves reads from cache when possible
  • Automatically invalidates on writes
  • Integrates with existing SDK
Application -> Dedicated Gateway (Cache) -> Cosmos DB

You don’t change your application code - just the connection mode.

Enabling Integrated Cache

First, create a dedicated gateway for your Cosmos account:

az cosmosdb create \
    --name mycosmosaccount \
    --resource-group myresourcegroup \
    --default-consistency-level Session \
    --enable-dedicated-gateway true \
    --dedicated-gateway-size "CosmosDb.D4s"

# Or add to existing account
az cosmosdb update \
    --name mycosmosaccount \
    --resource-group myresourcegroup \
    --enable-dedicated-gateway true \
    --dedicated-gateway-size "CosmosDb.D4s"

Connecting Through the Gateway

Update your connection to use gateway mode:

// C# SDK example
using Microsoft.Azure.Cosmos;

var options = new CosmosClientOptions
{
    ConnectionMode = ConnectionMode.Gateway,
    // Use the dedicated gateway endpoint
    ApplicationPreferredRegions = new List<string> { "East US" }
};

// Use the dedicated gateway connection string
var client = new CosmosClient(
    "https://myaccount-eastus.sql.cosmos.azure.com:443/",
    "<your-key>",
    options
);

// Configure item request options for caching
var requestOptions = new ItemRequestOptions
{
    DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
    {
        MaxIntegratedCacheStaleness = TimeSpan.FromMinutes(5)
    }
};

// This read can be served from cache
var response = await container.ReadItemAsync<Product>(
    id: "product-123",
    partitionKey: new PartitionKey("electronics"),
    requestOptions: requestOptions
);

Console.WriteLine($"Cache hit: {response.Diagnostics.GetCacheDiagnostics().CacheHit}");

Python SDK Usage

from azure.cosmos import CosmosClient, PartitionKey
from azure.cosmos.aio import CosmosClient as AsyncCosmosClient

# Synchronous client
client = CosmosClient(
    url="https://myaccount-eastus.sql.cosmos.azure.com:443/",
    credential="<your-key>"
)

database = client.get_database_client("mydb")
container = database.get_container_client("products")

# Read with cache settings
response = container.read_item(
    item="product-123",
    partition_key="electronics",
    max_integrated_cache_staleness_in_ms=300000  # 5 minutes
)

# Query with caching
query = "SELECT * FROM c WHERE c.category = @category"
parameters = [{"name": "@category", "value": "electronics"}]

items = container.query_items(
    query=query,
    parameters=parameters,
    max_integrated_cache_staleness_in_ms=300000
)

for item in items:
    print(item)

Cache Staleness Configuration

The MaxIntegratedCacheStaleness setting controls how long cached data is valid:

// Different staleness for different operations
public class CacheConfiguration
{
    // Product catalog - can be stale for longer
    public static readonly TimeSpan ProductCacheDuration = TimeSpan.FromMinutes(15);

    // Inventory levels - need fresher data
    public static readonly TimeSpan InventoryCacheDuration = TimeSpan.FromSeconds(30);

    // Pricing - real-time required
    public static readonly TimeSpan PricingCacheDuration = TimeSpan.Zero;
}

// Apply per request
var options = new ItemRequestOptions
{
    DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
    {
        MaxIntegratedCacheStaleness = CacheConfiguration.ProductCacheDuration
    }
};

Query Caching

Queries are also cached:

// This query result will be cached
var queryDefinition = new QueryDefinition(
    "SELECT * FROM c WHERE c.category = @category ORDER BY c.name")
    .WithParameter("@category", "electronics");

var queryOptions = new QueryRequestOptions
{
    PartitionKey = new PartitionKey("electronics"),
    DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
    {
        MaxIntegratedCacheStaleness = TimeSpan.FromMinutes(5)
    }
};

using var resultSet = container.GetItemQueryIterator<Product>(
    queryDefinition,
    requestOptions: queryOptions
);

while (resultSet.HasMoreResults)
{
    var response = await resultSet.ReadNextAsync();
    // First request hits DB, subsequent requests (within staleness window) hit cache
}

Sizing the Dedicated Gateway

Choose gateway size based on your workload:

SizevCPUsMemoryCache SizePrice/hour
D4s416 GB~8 GB~$0.35
D8s832 GB~16 GB~$0.70
D16s1664 GB~32 GB~$1.40

Estimate cache size needs:

# Calculate approximate cache requirements
average_item_size_kb = 2  # KB per document
unique_items_accessed = 100000  # Items accessed in cache window
query_result_overhead = 1.5  # Queries store more than raw items

cache_needed_gb = (average_item_size_kb * unique_items_accessed * query_result_overhead) / (1024 * 1024)
print(f"Estimated cache needed: {cache_needed_gb:.2f} GB")

Cache Invalidation

The cache is automatically invalidated on writes:

// Write operation automatically invalidates cache
await container.UpsertItemAsync(updatedProduct, new PartitionKey(updatedProduct.Category));

// Next read will get fresh data from database
var fresh = await container.ReadItemAsync<Product>(
    updatedProduct.Id,
    new PartitionKey(updatedProduct.Category)
);

For cross-region scenarios, invalidation propagates across regions but with some delay.

Monitoring Cache Performance

Track cache effectiveness with diagnostics:

public class CacheMetrics
{
    public int CacheHits { get; set; }
    public int CacheMisses { get; set; }
    public double HitRatio => CacheHits + CacheMisses > 0
        ? (double)CacheHits / (CacheHits + CacheMisses)
        : 0;
}

public async Task<T> ReadWithMetrics<T>(Container container, string id, string partitionKey, CacheMetrics metrics)
{
    var response = await container.ReadItemAsync<T>(
        id,
        new PartitionKey(partitionKey),
        new ItemRequestOptions
        {
            DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
            {
                MaxIntegratedCacheStaleness = TimeSpan.FromMinutes(5)
            }
        }
    );

    var diagnostics = response.Diagnostics.ToString();
    if (diagnostics.Contains("\"CacheHit\":true"))
    {
        metrics.CacheHits++;
    }
    else
    {
        metrics.CacheMisses++;
    }

    return response.Resource;
}

When to Use Integrated Cache

Good fit:

  • Read-heavy workloads (>80% reads)
  • Hot data patterns (same items read repeatedly)
  • Point reads and simple queries
  • Cost-sensitive applications

Not ideal for:

  • Write-heavy workloads
  • Unique reads (every read is different)
  • Real-time requirements (can’t tolerate staleness)
  • Cross-partition queries

Cost Analysis

Compare total costs with and without cache:

Without cache:
- 1M reads/day at 5 RU each = 5M RU/day
- At $0.10 per 100 RU/s/hour = ~$35/day

With cache (90% hit rate):
- 100K reads hit DB = 500K RU/day
- Gateway cost: D4s = ~$8.40/day
- Total: ~$5 + $8.40 = ~$13.40/day

Savings: ~$21.60/day = ~$650/month

The break-even depends on your read patterns and hit rate.

Migration Strategy

Adopting integrated cache is low-risk:

  1. Enable dedicated gateway on your account
  2. Update connection strings in non-production
  3. Test and measure cache hit rates
  4. Adjust staleness settings based on requirements
  5. Roll out to production
  6. Monitor and optimize

If issues arise, simply revert to direct mode - no data changes required.

Resources

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.