Cosmos DB Integrated Cache: Reducing RU Costs at Scale
Cosmos DB has introduced an integrated cache that sits between your application and the database. This can dramatically reduce RU consumption and latency for read-heavy workloads, without managing a separate caching layer.
The Problem It Solves
Before integrated cache, the pattern for reducing Cosmos DB costs was:
- Deploy Redis or another cache
- Implement cache-aside pattern in application code
- Handle cache invalidation
- Manage another service
This works but adds complexity. The integrated cache handles all of this at the database level.
How It Works
The integrated cache is a dedicated gateway that:
- Caches frequently accessed items
- Serves reads from cache when possible
- Automatically invalidates on writes
- Integrates with existing SDK
Application -> Dedicated Gateway (Cache) -> Cosmos DB
You don’t change your application code - just the connection mode.
Enabling Integrated Cache
First, create a dedicated gateway for your Cosmos account:
az cosmosdb create \
--name mycosmosaccount \
--resource-group myresourcegroup \
--default-consistency-level Session \
--enable-dedicated-gateway true \
--dedicated-gateway-size "CosmosDb.D4s"
# Or add to existing account
az cosmosdb update \
--name mycosmosaccount \
--resource-group myresourcegroup \
--enable-dedicated-gateway true \
--dedicated-gateway-size "CosmosDb.D4s"
Connecting Through the Gateway
Update your connection to use gateway mode:
// C# SDK example
using Microsoft.Azure.Cosmos;
var options = new CosmosClientOptions
{
ConnectionMode = ConnectionMode.Gateway,
// Use the dedicated gateway endpoint
ApplicationPreferredRegions = new List<string> { "East US" }
};
// Use the dedicated gateway connection string
var client = new CosmosClient(
"https://myaccount-eastus.sql.cosmos.azure.com:443/",
"<your-key>",
options
);
// Configure item request options for caching
var requestOptions = new ItemRequestOptions
{
DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
{
MaxIntegratedCacheStaleness = TimeSpan.FromMinutes(5)
}
};
// This read can be served from cache
var response = await container.ReadItemAsync<Product>(
id: "product-123",
partitionKey: new PartitionKey("electronics"),
requestOptions: requestOptions
);
Console.WriteLine($"Cache hit: {response.Diagnostics.GetCacheDiagnostics().CacheHit}");
Python SDK Usage
from azure.cosmos import CosmosClient, PartitionKey
from azure.cosmos.aio import CosmosClient as AsyncCosmosClient
# Synchronous client
client = CosmosClient(
url="https://myaccount-eastus.sql.cosmos.azure.com:443/",
credential="<your-key>"
)
database = client.get_database_client("mydb")
container = database.get_container_client("products")
# Read with cache settings
response = container.read_item(
item="product-123",
partition_key="electronics",
max_integrated_cache_staleness_in_ms=300000 # 5 minutes
)
# Query with caching
query = "SELECT * FROM c WHERE c.category = @category"
parameters = [{"name": "@category", "value": "electronics"}]
items = container.query_items(
query=query,
parameters=parameters,
max_integrated_cache_staleness_in_ms=300000
)
for item in items:
print(item)
Cache Staleness Configuration
The MaxIntegratedCacheStaleness setting controls how long cached data is valid:
// Different staleness for different operations
public class CacheConfiguration
{
// Product catalog - can be stale for longer
public static readonly TimeSpan ProductCacheDuration = TimeSpan.FromMinutes(15);
// Inventory levels - need fresher data
public static readonly TimeSpan InventoryCacheDuration = TimeSpan.FromSeconds(30);
// Pricing - real-time required
public static readonly TimeSpan PricingCacheDuration = TimeSpan.Zero;
}
// Apply per request
var options = new ItemRequestOptions
{
DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
{
MaxIntegratedCacheStaleness = CacheConfiguration.ProductCacheDuration
}
};
Query Caching
Queries are also cached:
// This query result will be cached
var queryDefinition = new QueryDefinition(
"SELECT * FROM c WHERE c.category = @category ORDER BY c.name")
.WithParameter("@category", "electronics");
var queryOptions = new QueryRequestOptions
{
PartitionKey = new PartitionKey("electronics"),
DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
{
MaxIntegratedCacheStaleness = TimeSpan.FromMinutes(5)
}
};
using var resultSet = container.GetItemQueryIterator<Product>(
queryDefinition,
requestOptions: queryOptions
);
while (resultSet.HasMoreResults)
{
var response = await resultSet.ReadNextAsync();
// First request hits DB, subsequent requests (within staleness window) hit cache
}
Sizing the Dedicated Gateway
Choose gateway size based on your workload:
| Size | vCPUs | Memory | Cache Size | Price/hour |
|---|---|---|---|---|
| D4s | 4 | 16 GB | ~8 GB | ~$0.35 |
| D8s | 8 | 32 GB | ~16 GB | ~$0.70 |
| D16s | 16 | 64 GB | ~32 GB | ~$1.40 |
Estimate cache size needs:
# Calculate approximate cache requirements
average_item_size_kb = 2 # KB per document
unique_items_accessed = 100000 # Items accessed in cache window
query_result_overhead = 1.5 # Queries store more than raw items
cache_needed_gb = (average_item_size_kb * unique_items_accessed * query_result_overhead) / (1024 * 1024)
print(f"Estimated cache needed: {cache_needed_gb:.2f} GB")
Cache Invalidation
The cache is automatically invalidated on writes:
// Write operation automatically invalidates cache
await container.UpsertItemAsync(updatedProduct, new PartitionKey(updatedProduct.Category));
// Next read will get fresh data from database
var fresh = await container.ReadItemAsync<Product>(
updatedProduct.Id,
new PartitionKey(updatedProduct.Category)
);
For cross-region scenarios, invalidation propagates across regions but with some delay.
Monitoring Cache Performance
Track cache effectiveness with diagnostics:
public class CacheMetrics
{
public int CacheHits { get; set; }
public int CacheMisses { get; set; }
public double HitRatio => CacheHits + CacheMisses > 0
? (double)CacheHits / (CacheHits + CacheMisses)
: 0;
}
public async Task<T> ReadWithMetrics<T>(Container container, string id, string partitionKey, CacheMetrics metrics)
{
var response = await container.ReadItemAsync<T>(
id,
new PartitionKey(partitionKey),
new ItemRequestOptions
{
DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions
{
MaxIntegratedCacheStaleness = TimeSpan.FromMinutes(5)
}
}
);
var diagnostics = response.Diagnostics.ToString();
if (diagnostics.Contains("\"CacheHit\":true"))
{
metrics.CacheHits++;
}
else
{
metrics.CacheMisses++;
}
return response.Resource;
}
When to Use Integrated Cache
Good fit:
- Read-heavy workloads (>80% reads)
- Hot data patterns (same items read repeatedly)
- Point reads and simple queries
- Cost-sensitive applications
Not ideal for:
- Write-heavy workloads
- Unique reads (every read is different)
- Real-time requirements (can’t tolerate staleness)
- Cross-partition queries
Cost Analysis
Compare total costs with and without cache:
Without cache:
- 1M reads/day at 5 RU each = 5M RU/day
- At $0.10 per 100 RU/s/hour = ~$35/day
With cache (90% hit rate):
- 100K reads hit DB = 500K RU/day
- Gateway cost: D4s = ~$8.40/day
- Total: ~$5 + $8.40 = ~$13.40/day
Savings: ~$21.60/day = ~$650/month
The break-even depends on your read patterns and hit rate.
Migration Strategy
Adopting integrated cache is low-risk:
- Enable dedicated gateway on your account
- Update connection strings in non-production
- Test and measure cache hit rates
- Adjust staleness settings based on requirements
- Roll out to production
- Monitor and optimize
If issues arise, simply revert to direct mode - no data changes required.