PyData Global 2024

From Inference to Features: Build a Core ML Platform from Scratch
12-05, 16:00–17:30 (UTC), AI/ML Track

This hands-on tutorial guides participants through the process of constructing the essential components of a Machine Learning Platform (MLP) from scratch. We'll focus on implementing five core elements: a feature store, model registry, orchestrator, inference engine, and basic monitoring system. The session emphasizes practical, hands-on coding using Test-Driven Development (TDD), Domain Driven Design, and hexagonal architecture principles providing attendees with a functional foundation for a robust ML infrastructure.


In this intensive 90-minute tutorial, participants will build a streamlined Machine Learning Platform (MLP) focusing on core functionalities. This session is designed for data scientists, machine learning engineers, and software developers who want to gain hands-on experience in constructing the fundamental components of a machine learning infrastructure.

Outline

Introduction and Setup (10 minutes)

  • Overview of the ML platform architecture
  • Brief Review of Domain Driven Design

Inference Engine (30 minutes)

  • Building a straightforward model serving component
  • Implementing prediction functionality

Model Registry & Feature Store (15 minutes)

  • Creating a simple model versioning system
  • Storing and retrieving model metadata

Event Driven Design & Message Bus (30 minutes)

  • Overview of Event Driven Design
  • Developing a basic workflow management system

Model Trainer (Time Permitting)

  • Simple model trainer

Challenge Questions (5 minutes)

  • Additional considerations to take the project forward.
  • Brief discussion on scaling and additional components (monolith, microservice, cloud native)

Resources for further learning

Throughout the tutorial, we will borrow principles from domain driven design to formalize the bounded contexts within an ML Platform. We will also emphasize the importance of writing tests first, demonstrating how TDD can lead to more robust and reliable ML infrastructure components.
Participants will follow along, building and testing each component in real-time. By the end of the session, attendees will have a functional, well-tested, albeit basic, machine learning platform that they can further expand and customize.

Requirements:

  • Intermediate understanding of Python programming
  • Basic familiarity with machine learning concepts
  • Basic knowledge of unit testing (beneficial but not required, pytest will also be helpful)
  • Laptop with Python 3.12+ installed

Materials:

All necessary code, tests, and documentation will be provided through a GitHub repository. The link to the repository will be shared with participants after the session. Attendees are encouraged to clone the repository and install the required dependencies before the tutorial begins.
By the end of this tutorial, participants will have gained practical insights into building the core components of a machine learning platform using TDD. They'll understand how these essential elements interact to form a basic ML infrastructure, providing a solid foundation for further exploration and implementation in their own projects.​​​​​​​​​​​​​​​​


Prior Knowledge Expected

Previous knowledge expected

Nathan Colbert is an ML professional with 5 years experience building, deploying, and owning end-to-end ML Systems. Nathan works at Peacock as a Senior Manager of ML Architecture where he is focused on accelerating ML delivery across the organization.