Artificial Intelligence Machine Learning

Opening the Black Box: Mechanistic Interpretability of LLMs

FORMAT: TalkLEVEL: Beginner LANGUAGE: English

As agents are deployed in high-stakes contexts (finance, manufacturing, healthcare), understanding how they make decisions—and not just what they decide—becomes fundamental to safety and trust. For example, when an agent receives the instruction "Search for our company's third-quarter results" and chooses to search internal documents instead of the public web, what internal process drives that choice? Answer engineering, behavioral testing, and chain-of-thought analysis describe correlations or narratives; none reveals the actual mechanism. Understanding how an agent reaches a conclusion is a critical component of developing AI responsibly, especially regarding reliability and transparency in AI systems. Model interpretability is one way developers can build trust and consistency in their systems and support the safe deployment of AI agents.

Speaker

David Cardozo

Senior AI Engineer @ Dataiku

Machine Learning Scientist and Cloud Infrastructure Architect. With a career spanning from information security to DevOps, I'm a Google Developer Expert in ML and Docker Captain. Passionate about multiplying matrices at high speed, I currently work as an AI Engineer at Dataiku.

View speaker

Want to know more?

Join PyCon Colombia newsletter and get a complete overview of our events, speakers and community participation.