PyData Global 2024

Shekhar Prasad Rajak

Shekhar is deeply passionate about open source software and actively contributes to various projects, including SymPy, Ruby gems like daru and daru-view (which he authored), Bundler, NumPy/SciPy, Apache Projects like Druid, Kafka .
He successfully completed Google Summer of Code in 2016 and 2017 and has served as an admin for SciRuby, mentoring multiple organizations.
Shekhar has spoken at prominent conferences such as RubyConf 2018, PyCon 2017, ApacheCon 2020, and Community Over Code 2024, as well as numerous regional meetups. Currently, he works at Apple as a Software Development Engineer.

The speaker's profile picture

Sessions

12-05
10:30
90min
Building a Real-Time Data Pipeline with Flink, Druid, and Python
Shekhar Prasad Rajak

Drowning in data? Struggling to make real-time decisions as information flows in faster than ever? This talk reveals how Python developers can harness the combined power of Apache Flink and Druid to conquer the challenges of real-time data processing and analysis.

Today's businesses demand immediate insights from ever-growing data streams. Apache Flink rises to this challenge with low-latency processing and sophisticated handling of out-of-order events, ensuring accuracy with exactly-once semantics. We'll explore Flink's Python API, focusing on its time and windowing capabilities that guarantee reliable data processing even in complex scenarios.

But Flink is more than just a pipeline. We'll showcase how it surpasses traditional solutions like Kafka, especially for complex event processing and dynamic windowing. Then, we'll introduce Apache Druid, a high-performance analytical database built for rapid queries on massive datasets. See how Flink efficiently feeds pre-processed data into Druid, transforming it into your real-time analytical engine, seamlessly integrated with your Python workflows. Dive in and discover the future of data-driven decision-making.

Data/ Data Science Track
Data/ Data Science Track