Cost Optimization Strategies for GenAI with Python and AWS

FORMAT: TalkLEVEL: Intermediate LANGUAGE: Spanish

Is it possible to scale Generative AI without project success compromising the organization's financial stability? This session will address how to transform the deployment of large language models (LLMs) through architecture design oriented toward operational efficiency. Instead of accepting high token consumption as an inevitable cost, we'll explore a sustainable cost model that lets you build intelligent, scalable applications without sacrificing profitability. Through a technical path centered on Python and AWS services, we'll analyze key strategies such as model arbitrage, where application logic dynamically decides which intelligence engine to use based on task complexity. We'll dive into how smart use of low-impact vector databases and semantic caching reuse prior knowledge, achieving significant infrastructure savings. Attendees will discover how implementing async flows and batch processing optimizes available resources. This talk is a practical guide for architects and developers looking to lead the transition from costly prototypes to production systems that are technically and economically viable.

Speaker

Juan Diego David Melo Alarcón

Application Architect @ IBM

I'm a solutions architect focused on systemic efficiency and building software designed for real-world challenges. My professional focus sits at the intersection of generative AI and cloud-native architectures, where the real challenge isn't just getting a model to respond, but doing so resiliently, scalably, and in a financially sensible way. Throughout my career I've led application modernization in hybrid cloud environments, facing the complexity of integrating cutting-edge services with critical infrastructure.

View speaker

Want to know more?

Join PyCon Colombia newsletter and get a complete overview of our events, speakers and community participation.