August 28, 2025 1 min read

Embedding Models: Choosing Between OpenAI, Azure, and Open Source

Embeddings Vector Search OpenAI RAG Machine Learning

Embedding models convert text into dense vector representations, enabling semantic search and similarity comparisons. Choosing the right embedding model impacts both quality and cost of your AI applications.

Azure OpenAI Embeddings

The text-embedding-ada-002 and newer text-embedding-3 models provide high-quality embeddings with minimal setup.

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_KEY"],
    api_version="2024-08-01-preview",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)

def get_embeddings(texts: list[str], model: str = "text-embedding-3-small") -> list[list[float]]:
    """Get embeddings for a list of texts."""
    response = client.embeddings.create(
        model=model,
        input=texts,
        dimensions=1536  # Can reduce for text-embedding-3 models
    )

    return [item.embedding for item in response.data]

# Usage
texts = ["How to implement RAG", "Building search applications"]
embeddings = get_embeddings(texts)

Open Source Alternatives

Sentence Transformers provide local embedding generation without API calls.

from sentence_transformers import SentenceTransformer

# Load a high-quality open source model
model = SentenceTransformer('BAAI/bge-large-en-v1.5')

def get_local_embeddings(texts: list[str]) -> list[list[float]]:
    """Generate embeddings locally."""
    embeddings = model.encode(texts, normalize_embeddings=True)
    return embeddings.tolist()

# For multilingual support
multilingual_model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')

Comparison Table

Model	Dimensions	Cost	Latency	Quality
text-embedding-3-small	512-1536	Low	Low	Good
text-embedding-3-large	256-3072	Medium	Low	Excellent
BGE-large-en	1024	Free	Medium	Excellent
E5-large-v2	1024	Free	Medium	Very Good

Selection Criteria

Use Azure OpenAI embeddings when you need simplicity, consistent quality, and can afford API costs. Choose open source when you need to control costs at scale, require offline operation, or have specific domain requirements that benefit from fine-tuning.