Understanding the end-to-end LLM training and inference pipeline PyData Global 2024

Understanding the end-to-end LLM training and inference pipeline
.ical

12-05, 18:00–18:30 (UTC), LLM Track

Have you ever wanted to understand LLM internals such as pre-training, supervised fine-tuning, instruction-tuning, reinforcement learning with human feedback, parameter efficient fine-tuning, expanding LLM context lengths, attention mechanism variants, model deployment performance, and cost optimization, which GPUs to use when and more? This talk will take an end-to-end review of the LLM training and deployment pipeline to give you both a stronger intuition and a faster path to implementation using model training and deployment frameworks.

This talk will cover the following topics

LLM Pretraining
Supervised Fine-tuning
Instruction Tuning
Reinforcement Learning with Human Feedback - Direct Preference Optimization and Proximal Policy Optimization
Context lengths and expanding them
What is attention and different types of attention mechanisms
How to optimize an LLM for deployments
What is KVCache and various optimizations
Which GPUs to leverage for different model types
Understanding Quantization as a means to compress models and maintain accuracy
Measuring the performance of LLM deployments
Understanding LLM Accuracy evaluation benchmarks

Prior Knowledge Expected –

Previous knowledge expected

Mark Moyou, PhD

Dr. Mark Moyou Senior Data Scientist at NVIDIA, podcast host and conference director. At NVIDIA he works with enterprise clients on AI strategy and deploying machine learning applications to production. He is the host of AI Portfolio, Caribbean Tech Pioneers and the Progress Guaranteed Podcasts and runs the Optimized AI Conference.

Understanding the end-to-end LLM training and inference pipeline .ical 12-05, 18:00–18:30 (UTC), LLM Track

Understanding the end-to-end LLM training and inference pipeline
.ical

12-05, 18:00–18:30 (UTC), LLM Track