PyData Global 2024

Using and contributing to the data.table package for efficient big data analysis
12-04, 13:30–15:00 (UTC), Data/ Data Science Track

data.table is an R package with C code that is one of the most efficient open-source in-memory data manipulation packages available today. First released to CRAN by Matt Dowle in 2006, it continues to grow in popularity, and now over 1500 other CRAN packages depend on data.table. This talk will start with data reading from CSV, discuss basic and advanced data manipulation topics, and finally will end with a discussion about how you can contribute to data.table.


https://github.com/tdhock/2023-10-LatinR-data.table?tab=readme-ov-file#english


Prior Knowledge Expected

No previous knowledge expected

A Berkeley-educated California native, Toby Dylan Hocking received his PhD in mathematics (machine learning) from Ecole Normale Superiere de Cachan (Paris, France) in 2012. He worked as a postdoc in Masashi Sugiyama’s machine learning lab at Tokyo Tech in 2013, and in Guillaume Bourque’s genomics lab in McGill University, Montreal, Canada (2014-2018).

In 2018-2024 he was a tenure-track Assistant Professor at Northern Arizona University, and since 2024 he is a tenured Associate Professor at Université de Sherbrooke, where he directs the LASSO research lab (Learning Algorithms, Statistical Software, Optimization). Since 2024, Toby is also an Associate Academic member at Mila - Quebec Artificial Intelligence Institute.

He has authored dozens of R packages, and has published 40+ peer-reviewed research papers on machine learning and statistical software. He has mentored 30+ students in research projects, as well as another 30+ open-source software contributors with R Project in Google Summer of Code.