12-03, 14:30–15:00 (UTC), LLM Track
Large Language Models are great at writing and chatting, but are they also able to talk like a human? Today, modern LLM-based voice bots can listen to users, talk back to them with a realistic voice, handle interruptions and improvise, while sticking to the goal they're given by their builders. And this is not only true for the latest, eye-watering expensive OpenAI's models! In this session we will learn how modern voice bots are made, which open source tools are available to build them, and we are going to see in practice how to build one. At the end of the session, the demo's full source code will be shared with the audience.
This talk's provides an in-depth introduction on the state of the art of LLM-based voice bots. It assumes basic familiarity with the concepts of LLMs and GenAI, but it doesn't require any other prior knowledge from the audience.
The talk is divided into three parts:
- First, I will describe how voice bots are build from the ground up. We will see their basic building blocks and explain some of the jargon. (5-7 mins)
- Second, we will see what are the main challenges that still plague voice bots today in terms of interaction. LLMs are really good at mimicking human speech, so what makes it still so easy to tell if you're speaking to a human or to a robot? (5-10 mins)
- Last, we will see what are some of the best open source Python tools available today to build a cutting edge voice bot and show a quick live demo of the capabilities of such a bot. (5-10 mins)
No previous knowledge expected
Sara Zanzottera is Lead AI Engineer at Kwal working on voice agents and conversation analysis with LLMs. Before joining Kwal she was a core maintainer of Haystack, one of the most mature open-source RAG frameworks, and lead the design and implementation of its 2.0 version. She started her career at CERN as a Python software engineer on the particle accelerator’s control systems.