12-04, 13:00–13:30 (UTC), General Track
Traditional document processing for Retrieval-Augmented Generation (RAG) often involves cumbersome, error-prone extraction pipelines, hampering AI's ability to retrieve high-quality information from complex formats like PDFs and PowerPoint decks. ColPali disrupts this process by embedding entire pages—text, visuals, and layout—into rich, multi-vector representations using Vision Language Models (VLMs). This talk explores how ColPali, paired with multimodal models like the Llama 3.2 Vision series, enables RAG systems to “see” and reason over documents, dramatically improving retrieval performance. Attendees will learn to implement ColPali for enhanced, scalable, and robust enterprise knowledge retrieval.
Imagine you could instantly unlock insights from complex documents—PDFs, PowerPoint decks, even scanned files—without the bottleneck of clunky, resource-hungry extraction pipelines. Till now we’ve relied on brittle processes involving OCR, layout analysis, and page-by-page parsing, but as any data scientist knows, the "garbage in, garbage out" reality plagues Retrieval-Augmented Generation (RAG) systems, making them unpredictable and error-prone.
Enter ColPali, a game-changing retrieval model that disrupts conventional methods by embedding entire document pages—text, visuals, and all—directly into contextualized vector spaces. ColPali doesn’t just read a document but sees it, utilizing Vision Language Models (VLMs) to create richly embedded, multi-vector representations. When used with multimodal models like the Llama 3.2 Vision series, this RAG pipeline goes beyond simple text extraction, allowing AI to reason over images, graphs, and complex layouts in ways that traditional RAG setups simply can’t.
In this talk, you’ll discover how ColPali enables document processing and retreival and redefines RAG. We’ll dive into real-world applications, where ColPali is not just a theory but can be used to RAG over a complex Nvidia investor deck!
Learn how to use ColPali with vector databases, experience its superior performance on the Visual Document Retrieval (ViDoRe) Benchmark, and unlock AI systems that tackle messy, multimodal data in new ways.
No previous knowledge expected
Zain Hasan is a Senior AI/ML DevRel Engineer at Together AI a company that allows people to train, fine-tune, and run generative AI models faster, at lower cost, and at production scale. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto building artificially intelligent assistive technologies. He then founded his company developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients. More recently he practiced as a consultant senior data scientist in Toronto. He is passionate about open-source software, education, community, and machine learning and has delivered workshops and talks at multiple events and conferences.