The best public example of AI Evals I've ever seen
Think evals are just for engineers? See how a product manager transformed her prototype to a production-ready app with evals.
Hi everyone,
I'm often asked for the best public example of an AI evals workflow for a real, production application. I finally have an answer.
If you've ever thought that evals are too "fancy," too complex, or something only deeply technical engineers can do, I want you to watch this talk.
Teresa Torres, a legendary product discovery coach and author, was a student in our first AI Evals cohort. She's not a daily coder, but she took the core principles from the course and applied them to build an AI interview coach from scratch.
What she created is, in my opinion, the single best public demonstration of how to do evals right.
Watch the full presentation: From Noob to 5 Automated Evals in 4 Weeks (as a PM)
In this incredibly hands-on lesson, Teresa shows exactly how she:
Started with error analysis FIRST to find real, user-impacting issues instead of getting lost in generic metrics.
Used Jupyter notebooks to systematically investigate failures and analyze results (it's a fantastic commercial for notebooks).
Built her own custom annotation tools and widgets directly inside her notebooks to speed up her workflow.
Wrote both LLM-as-a-judge and simple code-based assertions to test for the specific errors she found.
Iterated relentlessly through this feedback loop, measurably squashing bugs and improving her product with each cycle.
Kept things simple the whole time, proving you don't need a massive, over-engineered stack to get started.
This isn't a theoretical talk. It's a real-world case study of someone building a production app, learning Python and new tools along the way, and using a practical evals process to drive massive improvements. It’s an empowering story that shows this is within reach for anyone willing to dive in.
Hope you enjoy it,
Hamel
P.S. Teresa's journey started in our AI Evals course. If her story inspires you to build your own systematic feedback loops, I wanted to let you know about our upcoming cohort.
You'll get lifetime access to all materials and learn the same frameworks that helped Teresa build her application. the early bird discount for our next cohort ends this Friday. If you're interested, you can learn more and register here:
➡️ AI Evals Course (> $1,000 Discount)
Hope to see some of you there!