Discover the Talks at PyCon Colombia 2026 ✨
Browse every accepted session—titles, tracks, levels, and speakers—before you plan your days in Medellín.
Machine Learning Applied to Genetic Sequences
DNA contains massive amounts of biological information, but how can artificial intelligence help us understand it? In this talk, we will explore how Python and Machine Learning can be used to analyze genetic sequences in a practical and beginner-friendly way. Using public biological datasets, we will demonstrate how DNA sequences can be transformed into data suitable for machine learning models, covering concepts such as feature extraction, sequence representation, and basic classification techniques. We will also review popular Python tools used in bioinformatics, including Biopython, pandas, and scikit-learn, while discussing real-world challenges when working with biological data, such as high dimensionality, noise, and interpretability limitations. By the end of the talk, attendees will have a clear understanding of how to start building genetic analysis projects using accessible tools from the Python ecosystem, even without prior bioinformatics experience.
Vulnerable AI Systems: Real Data, Responsible Design
29% of attacks bypass the security filters of the most widely used LLMs in production. It's not a bug. It's the nature of the system. LLMs are stochastic processes trained on human language—the most flexible, ambiguous, and manipulable medium that exists. This talk presents the results of llm-break-bench: 3,360 adversarial tests on GPT-4o, Claude, Gemini, Grok, and DeepSeek using MLCommons AI Safety v0.5 and OWASP LLM Top 10 as standards. The smartest model in the benchmark is 5 times more vulnerable than the cheapest. The data connects to real use cases where LLMs are in production: RAGs, chatbots, agents, code assistants. The closing is actionable: 5 design pillars for AI systems that don't depend on the model for their own security, with real code from NVIDIA NeMo Guardrails and Meta LlamaFirewall.
How to Find Pearls on the Bottom of the Sea – Autoencoders as Anomaly Detection Models
Like finding pearls on the ocean floor, detecting rare anomalies in large datasets requires sophisticated techniques. In this workshop, you'll learn the theory and practice of autoencoder architectures, how to train them for anomaly detection, how to set decision boundaries, and how to evaluate their performance. We'll work with real-world datasets and build complete anomaly detection pipelines in Python.
The GenAI Revolution Reaches RecSys
When we talk about the generative AI revolution, the conversation usually stays close to chatbots, image generation, and code assistants. But the same architectures that powered that wave (transformers, autoregressive modeling, scaling laws) are quietly reshaping fields most people don't associate with GenAI at all. Recommender systems are one of the most interesting examples. Meta, Netflix, Google, Spotify and others are replacing decades-old recsys pipelines with transformer-based foundation models, and the results are hard to ignore. This talk is a practical tour of that shift from a Python engineer's seat.
Structured Learning: AI-Powered Platform That Transforms Academic Papers into Interactive Learning Experiences
Structured Learning is a platform that turns a research paper into a complete learning module—chapter-by-chapter explanations, incremental executable code, RAG chat, FSRS spaced-repetition flashcards, equation derivations, and a knowledge graph in Neo4j. This talk covers the product, the engineering of an agentic workflow pipeline that takes a GitHub issue to a merged PR with isolated worktrees, auto-patching after failed review, and GitHub as the agents' API, and how it runs on AWS with LocalStack for dev-prod parity. Agents don't replace engineers—they replace the glue between engineers and the boring 80% of the SDLC—and that's where compound returns live.
Feeding the Invisible: Food Security in Intermediate Cities with Python
In many countries, food insecurity is not only a social problem but also a data problem. In Colombia, key monitoring systems have lost continuity, leaving critical information gaps for public decision-making. This talk presents the development of a Python prototype to build a monitoring and prediction system for food insecurity risk in intermediate cities, using only open data. From a reproducible pipeline, multiple data science components are integrated: ingestion and processing of food price data (SIPSA), time series models for price forecasting (including classical approaches and machine learning like XGBoost), household segmentation through clustering from socioeconomic surveys, construction of a composite index relating income, prices, and vulnerability, and development of a decision support system (DSS) prototype. Attendees will take away a replicable approach for building complex indicators, strategies for working with imperfect open data, ideas for integrating models, socioeconomic data, and visualization in a single system, and a real example of applying Python in public policy and territorial development.
From ETL to Agentic Workflows: The Evolution of Data Engineering in the Generative AI Era
Traditional ETL pipelines are deterministic and rigid. Agentic workflows powered by generative AI can adapt, reason, and handle the unexpected. In this workshop, you'll learn how to evolve your data engineering practices from classic ETL to intelligent agentic workflows. We'll cover designing agents for data extraction, transformation decisions, and loading strategies—as well as how to combine traditional orchestration tools with AI agents for hybrid architectures.
From S3 to AI Agent: Your First Queryable Lakehouse
AI agents are only as good as the data they can query. Most agents built today connect to outdated CSVs, unstructured databases, or nothing at all. What if your agent could query a real lakehouse—with versioning, schema evolution, and time travel—using natural language? In this workshop we build exactly that from scratch using only open-source tools that run on your laptop. Starting from a local Docker Compose stack, we stand up a functional lakehouse with MinIO as S3-compatible storage, Apache Iceberg as the table format, Project Nessie as a Git-like versioned catalog, and Trino as the SQL query engine. On top of that, we build a Python MCP server that exposes Iceberg tables as tools for an AI agent, and connect Claude so it can query the lakehouse in natural language.
NLP Without Labels: How to Cluster N Legal Processes of the Colombian State and Turn Chaos into a Production Classifier
What do you do when you have 600,000 legal complaints, zero labeled data, and a government entity waiting for results? This talk walks through the full process of building an unsupervised NLP classification system for the Procuraduría General de la Nación. Starting from raw administrative text—noisy, full of abbreviations and institutional jargon—I'll show how TF-IDF, truncated SVD, and KMeans combined to organize more than half a million records into 64 semantically coherent groups, without a single manual label. But clustering is only the starting point. I'll cover how clusters were validated, how a Logistic Regression classifier was trained on them to make the system deployable, and how the final pipeline was packaged in a .pkl that non-technical colleagues use in production today. Along the way we'll face real problems: elbow curves that don't behave, 1:20 size imbalances between clusters, and the tension between mathematical elegance and institutional usability. Because in the public sector, a model nobody uses isn't a model—it's a PDF gathering dust.
How We Stopped Answering Data Questions and Built the Stack That Answers Them
If you've worked at a growing startup, you probably know the feeling: multiple teams pulling different numbers for the same metric, ops constantly asking engineering for basic answers, and creating or organizing metrics that's a real pain. Every new question feels like starting from scratch. This talk is the story of how a small team fixed that. First, by building a proper dbt architecture from scratch with Sources, Staging, Intermediate, and Marts so that things like bookings, revenue, and providers were defined in one place and everyone was looking at the same number. Once the data was reliable, we connected an LLM so non-technical teammates could ask questions in plain English and get real answers directly from Snowflake. No SQL, no ticket, no waiting on engineering. You'll walk away with a clear mental model for building a dbt layer people actually trust, a practical architecture for connecting an LLM to your warehouse, and the one thing that made it all click: your dbt docs are your LLM prompt.
NLP in Practice: From Corpus Linguistics to RAG with Python
Bridge the gap between traditional corpus linguistics and modern Retrieval-Augmented Generation (RAG) systems. In this workshop, researchers and developers will learn how classical NLP techniques—corpus analysis, tokenization, and annotation—can inform and improve RAG implementations. We'll use Python to build a pipeline that takes a text corpus from raw collection through linguistic analysis to a queryable RAG system, demonstrating how academic NLP foundations enhance practical AI applications.
Biviana Marcela Suárez Sierra
Researcher @ Universidad EAFIT
Dora Cecilia Alzate Gallo
Researcher @ Universidad EAFIT
PyBlend: Towards an AI Food Scientist for Nutritional Product Design
Discover how Python and AI are transforming nutritional product design. In this workshop, you'll be introduced to PyBlend, a framework that models the complex optimization problem of designing nutritional formulations. We'll explore how machine learning algorithms can navigate vast ingredient spaces, balance nutritional constraints, and generate novel product formulations. Attendees will gain hands-on experience with AI-driven product design and learn how Python makes interdisciplinary AI applications possible.
Your AI Eval Is Lying To You
When you set temperature=0 and run your AI eval, you expect the same input to give the same output. It doesn't. Recent measurements on Qwen3-235B at temperature=0 produced 80 unique completions on a single prompt. So when your eval reports "92% pass rate," what does that actually mean? This talk is about the gap between how the AI eval ecosystem talks about scores and what those scores can actually support. We walk through five specific tools that fix the gap: Pass@k versus pass^k, Wilson confidence intervals, Bayesian pass@k with Beta-Binomial conjugacy, sequential drift detection with EWMA, CUSUM, and OLS, and family-wise error control via Benjamini-Hochberg procedures. Each method gets a short demo in pure Python with no framework dependency. The audience leaves with reference implementations they can paste into an existing pytest setup tonight.
Dashboards That Think: Build Agentic Analytics with Sigma
Learn how to build dashboards that don't just display data—they think. In this workshop, you'll combine Sigma's business intelligence capabilities with Python-based AI agents to create agentic analytics dashboards. We'll cover integrating LLMs with Sigma, building agent-driven data narratives, automating insight discovery, and creating dashboards that can answer follow-up questions and adapt dynamically to user context.