1 min read
Optimizing AI Cost Per Query
Reduced our AI cost per query by 70%. Here’s how.
The Baseline
Started at $0.08 per query. Too high for our user volume.
Optimization 1: Model Selection
Switched simple queries from GPT-4o to GPT-4o-mini.
Savings: 40%
Optimization 2: Aggressive Caching
from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def cached_embedding(text):
return get_embedding(text)
def cache_key(query):
return hashlib.sha256(query.encode()).hexdigest()
Savings: 20%
Optimization 3: Context Compression
Send only relevant context, not entire documents.
Savings: 15%
Optimization 4: Batch Processing
Group similar queries when possible.
Savings: 10%
Final Cost
$0.024 per query. 70% reduction.
The Lesson
Most AI costs come from waste. Cut the waste first.