Back to Blog
2 min read

Cosmos DB Vector Search: Implementing Semantic Search at Global Scale

Cosmos DB’s native vector search capability enables building globally distributed semantic search applications. With automatic multi-region replication and guaranteed single-digit millisecond latency, it’s ideal for real-time AI applications that need global reach.

Configuring Vector Indexing

Set up your container with vector indexing policies that balance performance and cost:

from azure.cosmos import CosmosClient, PartitionKey
from azure.cosmos.documents import IndexingMode

client = CosmosClient(endpoint, credential)
database = client.get_database_client("semantic-search")

# Create container with vector indexing
container = database.create_container_if_not_exists(
    id="documents",
    partition_key=PartitionKey(path="/category"),
    indexing_policy={
        "indexingMode": IndexingMode.Consistent,
        "includedPaths": [{"path": "/*"}],
        "excludedPaths": [{"path": "/embedding/*"}],
        "vectorIndexes": [
            {
                "path": "/embedding",
                "type": "quantizedFlat",
                "quantizationConfig": {
                    "quantizationType": "int8"
                },
                "dimensions": 1536,
                "distanceFunction": "cosine"
            }
        ]
    },
    vector_embedding_policy={
        "vectorEmbeddings": [
            {
                "path": "/embedding",
                "dataType": "float32",
                "dimensions": 1536,
                "distanceFunction": "cosine"
            }
        ]
    },
    offer_throughput=10000
)

Executing Vector Queries

Combine vector similarity with SQL filtering for powerful hybrid queries:

async def semantic_search(query_embedding: list, category: str, limit: int = 10):
    query = """
    SELECT TOP @limit c.id, c.title, c.content,
           VectorDistance(c.embedding, @embedding) AS similarity
    FROM c
    WHERE c.category = @category
    ORDER BY VectorDistance(c.embedding, @embedding)
    """

    results = container.query_items(
        query=query,
        parameters=[
            {"name": "@limit", "value": limit},
            {"name": "@embedding", "value": query_embedding},
            {"name": "@category", "value": category}
        ],
        enable_cross_partition_query=True
    )

    return [doc async for doc in results]

Global Distribution Strategy

Configure your Cosmos account with write regions near your users and read replicas globally. Vector searches automatically route to the nearest replica, delivering consistent sub-10ms latency worldwide.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.