Not All Caches Are Equal: Claude, OpenAI, and GeminiWe focus quite a bit on prompt caching @LittlebirdAI to ensure lower latencies and cost. But it's very tricky to get it right, esp when you deal with multiple providers. There are quite a few really gApr 22, 2026·2 min read
Curated resources: AI Product x How to eval?List of resouces for building a solid eval pipeline for your AI productMar 9, 2025·2 min read·525
Order of fields in Structured output can hurt LLMs outputEvals on how does the order of fields in prompts w/ structured output(JSON) affect the LLM response qualityJan 5, 2025·4 min read·3.3K
Experiments with gpt-4o vision and architecture diagramsEval based experimentation to figure out how well it works with architecture diagramsOct 22, 2024·4 min read·865
Notes on evals in LLM-based applicationsA big lesson I've learned while building LLM-based applications: "Unless you have a closed feedback loop, nothing you do will be good enough" Allow me to explain. When building an LLM app, I constantly found myself chasing 100s of different optimizat...Jul 22, 2024·2 min read·192
Looping defaultdict In Templates: A long-standing "Bug" in DjangoI am new to Django and recently spent a good chunk of time trying to debug what I thought was quite trivial. Looping a defaultdict in templates. This is a pretty common pattern when using Django, using a dict to pass and render data via templates, bu...May 12, 2024·3 min read·290