12-05, 19:00–19:30 (UTC), AI/ML Track
CSP is a newly open-sourced library for stream processing in Python. In this talk, we discuss how CSP can be leveraged to handle all stages of an online machine learning pipeline from feature generation to live training and inference.
Streaming data refers to datasets that update dynamically in real-time, often at unpredictable intervals triggered by external events. Online machine learning systems, which update dynamically as new data becomes available, consume streaming data for both training and inference.
We will explore how CSP can be used at all stages of the online data pipeline to create easy and effective ML applications. We show how users can leverage CSP for:
1. Data cleaning on streaming sources
2. Dynamic feature generation
3. Live training of ML models
4. Live inference
The talk culminates in a live demo of a CSP application for real-time spam detection. Data scientists and engineers who work with streaming data will benefit the most from this talk. However, the talk is designed to be accessible to anyone with intermediate-level knowledge of Python and data processing tools.
Previous knowledge expected
Pascal is Head of Research Technology for Cubist Systematic Strategies.