PyData Global 2024

Francesc Alted

I am a curious person who studied Physics and Math when I was young. Through the years, I developed a passion for handling large datasets and using compression to enable their analysis using regular hardware that is accessible to everyone.

I am the CEO of ironArray SLU and also leading the Blosc Development Team, and currently interested in determining, ahead of time, which combinations of codecs and filters can provide a personalized compression experience. This way, users can choose whether they prefer a higher compression ratio, faster compression speed, or a balance between both.

As an Open Source believer, I started the PyTables project more than 20 years ago. Currently, and after 25 years in this business, I am the proudly owner of two prizes that mean a lot to me:

You can know more on what I am working on by reading my latest blogs.

The speaker's profile picture

Sessions

12-03
11:30
90min
Mastering Large NDArray Handling with Blosc2 and Caterva2
Francesc Alted

As data grows larger and more complex, efficient storage and processing become critical to achieving scalable and high-performance computing. Blosc2 (https://www.blosc.org), a powerful meta-compressor library, addresses these challenges by enabling rapid compression and decompression of large, multidimensional arrays (NDArrays). This tutorial will introduce the core concepts of working with Blosc2, focusing on how it can be leveraged to optimize both storage and computational performance in Python.

Attendees will learn how to:

  1. Efficiently create and manage large NDArrays, including options for persistence.
  2. Select the best codecs and filters for specific data types and workflows to achieve optimal compression ratios and performance.
  3. Perform computations directly on compressed data to save memory and speed up processing.
  4. Seamlessly share NDArrays using Caterva2, a versatile library designed to enable remote sharing and serving of multidimensional datasets.

This tutorial is ideal for Python developers working with large-scale data in scientific computing, machine learning, and other data-intensive fields.

Data/ Data Science Track
Data/ Data Science Track