Back to Blog
1 min read

Implementing Hybrid Search in Azure AI Search

Hybrid search combines traditional keyword search with vector similarity search, delivering better retrieval results than either approach alone. Azure AI Search makes implementing hybrid search straightforward.

Keyword search excels at exact matches and rare terms. Vector search captures semantic similarity. Combining them leverages both strengths while mitigating individual weaknesses.

Setting Up the Index

from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex, SearchField, SearchFieldDataType,
    VectorSearch, HnswAlgorithmConfiguration, VectorSearchProfile,
    SemanticConfiguration, SemanticSearch, SemanticPrioritizedFields,
    SemanticField
)

# Define index with vector and keyword fields
index = SearchIndex(
    name="hybrid-knowledge-base",
    fields=[
        SearchField(name="id", type=SearchFieldDataType.String, key=True),
        SearchField(name="title", type=SearchFieldDataType.String, searchable=True),
        SearchField(name="content", type=SearchFieldDataType.String, searchable=True),
        SearchField(
            name="content_vector",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile_name="vector-profile"
        )
    ],
    vector_search=VectorSearch(
        algorithms=[HnswAlgorithmConfiguration(name="hnsw-config")],
        profiles=[VectorSearchProfile(name="vector-profile", algorithm_configuration_name="hnsw-config")]
    ),
    semantic_search=SemanticSearch(
        configurations=[SemanticConfiguration(
            name="semantic-config",
            prioritized_fields=SemanticPrioritizedFields(
                content_fields=[SemanticField(field_name="content")]
            )
        )]
    )
)

Executing Hybrid Queries

from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

def hybrid_search(query: str, query_embedding: list[float], top_k: int = 5):
    search_client = SearchClient(endpoint, index_name, credential)

    vector_query = VectorizedQuery(
        vector=query_embedding,
        k_nearest_neighbors=top_k,
        fields="content_vector"
    )

    results = search_client.search(
        search_text=query,  # Keyword search
        vector_queries=[vector_query],  # Vector search
        query_type="semantic",
        semantic_configuration_name="semantic-config",
        top=top_k
    )

    return list(results)

Hybrid search with semantic ranking typically improves retrieval relevance by 15-30% compared to vector-only approaches, making it the recommended pattern for production RAG systems.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.