Back to Blog
1 min read

LLM Observability: What Actually Matters

Everyone talks about observability. Few teams do it well. Here’s what matters in production.

What to Track

Latency: P50, P95, P99 response times

Cost: Per query, per user, per day

Quality: Response relevance, accuracy

Errors: Rate limits, failures, timeouts

How to Track It

import structlog

logger = structlog.get_logger()

def track_llm_call(func):
    async def wrapper(*args, **kwargs):
        start = time.time()
        
        try:
            result = await func(*args, **kwargs)
            
            logger.info("llm_call",
                model=result.model,
                tokens=result.usage.total_tokens,
                latency=time.time() - start,
                cost=calculate_cost(result.usage)
            )
            
            return result
        except Exception as e:
            logger.error("llm_call_failed",
                error=str(e),
                latency=time.time() - start
            )
            raise
    
    return wrapper

What to Alert On

  • Latency > 5 seconds (P95)
  • Error rate > 5%
  • Daily cost > budget threshold
  • Quality score drop > 10%

Tools That Help

  • Application Insights for Azure
  • Datadog for multi-cloud
  • Custom dashboards with Grafana
  • LangSmith for prompt tracking

The Key Insight

You can’t improve what you don’t measure. Start logging everything from day one.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.