Cost Optimization Strategies for GenAI with Python and AWS
Is it possible to scale Generative AI without project success compromising the organization's financial stability? This session will address how to transform the deployment of large language models (LLMs) through architecture design oriented toward operational efficiency. Instead of accepting high token consumption as an inevitable cost, we'll explore a sustainable cost model that lets you build intelligent, scalable applications without sacrificing profitability. Through a technical path centered on Python and AWS services, we'll analyze key strategies such as model arbitrage, where application logic dynamically decides which intelligence engine to use based on task complexity. We'll dive into how smart use of low-impact vector databases and semantic caching reuse prior knowledge, achieving significant infrastructure savings. Attendees will discover how implementing async flows and batch processing optimizes available resources. This talk is a practical guide for architects and developers looking to lead the transition from costly prototypes to production systems that are technically and economically viable.
Want to know more?
Join PyCon Colombia newsletter and get a complete overview of our events, speakers and community participation.


