12-04, 16:00–16:30 (UTC), LLM Track
Having worked on Kaggle's LLM-based ARC AGI program-writing challenge for 6 months using Llama3, I'll give reflections on the lessons learned making an automatic program generator, evaluating it, coming up with strong representations for the challenge, chain-of-thought and program-of-thought styles and some multi-stage critical thinking approaches. You'll get tips for tuning your own prompts and shortcuts to help you evaluate your own LLM usage with greater assurance in the face of non-deterministic outcomes.
Having worked on Kaggle's LLM-based ARC AGI program-writing challenge for 6 months using Llama3, I'll give reflections on the lessons learned making an automatic program generator, evaluating it, coming up with strong representations for the challenge, chain-of-thought and program-of-thought styles and some multi-stage critical thinking approaches. You'll get tips for tuning your own prompts and shortcuts to help you evaluate your own LLM usage with greater assurance in the face of non-deterministic outcomes.
Making an automated program generator is fun - evaluating the range of legal and illegal programs is hard! Evaluating a non-deterministic prompt is hard! Moving from a binary to a gradient score function is hard! I've learned a lot during this competition and I'll share my process for evaluating on 100s of iterations of a prompt on a home GPU and using the cloud, learning how to represent my challenge well in a prompt (it started - poorly!) plus I'll provide a set of curated paper references to help you on your own LLM journey.
If you're just starting out with LLMs you'll get new ideas for fair and reliable evaluation. If you've tackled automated program writing before (a bit like how GitHub CoPilot helps us write code) maybe you'll have ideas to share back to me.
No previous knowledge expected
Ian is a Chief Data Scientist, has co-founded and built the annual PyDataLondon conference raising $100k+ annually for the open source movement along with the associated 13,000+ member monthly meetup. Using data science he's helped clients find $2M in recoverable fraud, created the core IP which opened funding rounds for automated recruitment start-ups and diagnosed how major media companies can better supply recommendations to viewers. He gives conference talks internationally often as keynote speaker and is the author of the bestselling O'Reilly book High Performance Python (3rd edition for 2025). He has over 25 years of experience as a senior data science leader, trainer and team coach. For fun he's walked by his high-energy Springer Spaniel, surfs the Cornish coast and drinks fine coffee. Past talks and articles can be found at: