Back to Blog
1 min read

Optimizing AI Cost Per Query

Reduced our AI cost per query by 70%. Here’s how.

The Baseline

Started at $0.08 per query. Too high for our user volume.

Optimization 1: Model Selection

Switched simple queries from GPT-4o to GPT-4o-mini.

Savings: 40%

Optimization 2: Aggressive Caching

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
def cached_embedding(text):
    return get_embedding(text)

def cache_key(query):
    return hashlib.sha256(query.encode()).hexdigest()

Savings: 20%

Optimization 3: Context Compression

Send only relevant context, not entire documents.

Savings: 15%

Optimization 4: Batch Processing

Group similar queries when possible.

Savings: 10%

Final Cost

$0.024 per query. 70% reduction.

The Lesson

Most AI costs come from waste. Cut the waste first.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.