Watching LLMs: observability best practices for AI systems
DevOps Stage
—
45 minutes
DevOps
Open Source
LLM
Real case studies
AI Systems
When you deploy an LLM-powered system, you might not always think about monitoring. But what if your RAG pipeline is not retrieving the right context, if your prompts are causing hallucinations, or if your token costs just tripled overnight?
This talk covers the practical observability stack needed for LLM applications in production. By diving into code, we’ll explore how to instrument token usage and cost tracking across model calls, implement evaluation pipelines that catch quality degradation before your users do, and set up guardrails that prevent both prompt injection attacks and unexpected model behavior. You'll learn to trace requests through vector database queries, embedding generations, and completion calls to debug the full retrieval and generation pipeline.Using open source tools and vendor-agnostic approaches, we'll examine real production patterns: distinguishing between model latency and retrieval latency, tracking semantic drift in your vector embeddings, correlating user feedback with model parameters, and building dashboards that help you answer questions like "why did this query return irrelevant results?" and "which prompts are consuming 80% of our budget?"Whether you're running a chatbot, a document Q&A system, or any application with LLMs and vector databases, you'll leave with concrete strategies for understanding what your AI system is doing, how much it's costing you, and most importantly, whether it's working correctly.Takeaways: • How to implement observability across your entire LLM application • How to setup cost monitoring that prevents surprises • How to build effective guardrails and evaluation loops for production safety
Read More...