Nour El Mawass PyData Global 2024

Nour El Mawass
.ical

Nour leads the Generative AI technical group at Modus Create. She has a PhD in Machine Learning and has worked on Machine Learning, Data Science and Data Engineering problems in various domains, both inside and outside Academia.

Sessions

12-05

14:00

30min

Evaluating RAGs: On the correctness and coherence of Open Source eval metrics

Nour El Mawass, Joe Neeman

Retrieval-Augmented Generation (RAG), despite being a superstar of GenAI over the last year, comes with a plethora of challenges and is prone to errors. Open Source Python libraries like RAGAS and TruLens provide frameworks for evaluating RAG systems, using various metrics that leverage LLMs to assess performance. But when using LLM in a RAG system is in itself a source of errors, it remains to be seen how reliable it would be to use another LLM, allthebit a more powerful one, as a judge of the RAG performance. This study explores various RAG evaluation metrics, as well as the choice of evaluator LLM, to examine the reliability and consistency of LLM-based evaluations. The aim is to provide practical insights and guidance for interpreting these evaluations effectively, and help users make informed decisions when applying them in diverse contexts.

General Track

Nour El Mawass .ical

Sessions

Nour El Mawass
.ical