![](/media/avatars/Abbie_Brookes_-_2024_cLCqyWS.jpeg)
Data Scientist at Datacove. Co-organiser of EARL Conference. Co-host of Manchester, Brighton, London and Bristol R User Groups.
- Shiny Policies: Dashboards to Aid British Government Decisions (using R).
![](/media/avatars/adarsh_prfile_art_54Faroc.png)
Adarsh is a data professions with 8+ years of experience. He is currently a Sr Data Scientist at A+E Networks
- Scaling Outside the Warehouse Using DuckDB and Python
Hi, I'm Aditi. I've mostly been involved in the API dispatching discussions in the Scientific Python ecosystem. I am currently part of NetworkX's Core Developer team. NetworkX is a Python library with lots of graph algorithms. I've worked on the NetworkX project and the nx-parallel backend, previously as an Independent contractor, a GSoC contributor and a Core Developer. Also, I've presented my work and the NetworkX's dispatching at SciPy Con 2024(poster), at EuroSciPy 2024(talk) and PyCon India 2024(lightening talk). And, I am currently pursuing a bachelor's in Data Science and Application. (Email: [email protected])
Thank you :)
- Understanding API Dispatching in NetworkX
www.adrianastan.com
- Off-the-shelf HuggingFace models for audio deepfake detection
Ahad Shoaib is a Lead Data Scientist in the Infrastructure Data Science team at Salesforce, where he is responsible for taking time series forecasting and machine learning models all the way from “ideation” to “production”. Ahad holds a bachelor's degree in Computer Science & Math from the University of Waterloo.
- Foundational Time Series Models in Practice: The Future of Forecasting, or Just Hype?
I develop AI applications in Python powered by Rust. I am currently doing my masters in AI and Engineering systems at Technical University, Eindhoven. I maintain an Opensource project, EmbedAnything that has 250 stars and over 40000 downloads.
- Vector Streaming: The Memory Efficient Indexing for Vector Databases
![](/media/avatars/profile_2022_small_xSVwkyZ.png)
Dr. Alan Nichol is co-founder & CTO of Rasa, a widely-used open source platform for conversational AI, and has authored research papers, technical blog posts, and online courses on the subject of conversational AI. He is credited with popularizing the idea of building chat and voice bots that do not rely on 'intents', and is one of the creators of the Rasa framework. Alan holds a PhD in machine learning from the University of Cambridge, and in 2024 was named one of Europe's 100 most influential people in AI.
- Building an AI Travel Agent That Never Hallucinates
![](/media/avatars/aliciamw_headshot_zi5lt5h.jpg)
Alicia is a developer advocate for Google Cloud. Previously she spent six years as a program manager, where building, managing, and measuring programs and processes sparked her interest in data analytics. A long-time believer in the power of spreadsheets, she also uses machine learning, SQL, and visualizations to help solve problems and weave together stories.
- Python + BigQuery + DataFrames: Hands on with scalable "serverless" analysis, ML, and AI
Allen Downey is a professor emeritus at Olin College and Principal Data Scientist at PMC Labs. He is the author of several books -- including Think Python, Think Stats, and Probably Overthinking It -- and a blog about programming and data science. He is a consultant and instructor specializing in Bayesian statistics. He received a Ph.D. in computer science from the University of California, Berkeley, and Bachelor's and Masters degrees from MIT.
- Time Series Analysis with StatsModels
Allison Wang is a Software Engineer at Databricks and an Apache Spark Committer, specializing in Spark SQL and PySpark. She’s passionate about bridging Python with the big data ecosystem. Allison holds a bachelor’s degree in Computer Science from Carnegie Mellon University.
- Bridging Big Data and AI: Empowering PySpark with Lance Format for Multi-Modal AI Data Pipelines
![](/media/avatars/1648585401910_AwutPKI.jpg)
Alonso Silva is a Senior Researcher on Generative AI at Nokia Bell Labs. He previously did his Ph.D. at INRIA, a postdoc at UC Berkeley, and worked as an ML researcher at Safran.
- Building Knowledge Graph-Based Agents with Structured Text Generation and Open-Weights Models
I’m a product-focused data scientist who’s been part of the startup scene in London for over a decade.
I’ve been responsible for shaping products for the UK and US markets from an early stage, building models and infrastructure in the finance space, and authoring fairness policies to ensure models are both compliant and ethical.
I try to spend as much time as I can on understanding the problem. It’s usually a lot messier and more complex than I realise, but it’s key to building solutions that have a real impact!
- Taking Data Science in industry from zero to production
![](/media/avatars/AntonAntonov-400x400_jZ2PDTE.jpeg)
I am an applied mathematician (PhD) with 30+ years of experience in algorithm development, scientific computing, mathematical modeling, natural language processing, combinatorial optimization, research and development programming, machine learning, and data mining.
In the last 16 years, I focused on developing machine learning algorithms and workflows for different industries (entertainment, recruitment, healthcare, manufacturing, logistics.)
I am a former kernel developer of Mathematica.
- Quantile Regression Workflows
![](/media/avatars/Art_Anderson_Headshot_Nsd1tcs.jpeg)
Art Anderson, Director of Developer Advocacy
LinkedIn http://www.linkedin.com/in/artdanderson
Art is a passionate tech enthusiast, builder, and lifelong learner with a knack for simplifying complex concepts through real-world applications. With a diverse background spanning tax and accounting software, convolutional neural networks in machine vision, and NoSQL databases, Art excels in teaching and demonstrating how systems connect. Whether tinkering with tech or creating innovative solutions, Art’s unique perspective bridges the gap between understanding and application.
- Unlocking the Power of Hybrid Search: A Deep Dive into Python-Powered Precision and Scalability
![](/media/avatars/atin_headshot_0VroMTY.jpeg)
Atin is the Co-Founder & CTO of Galileo - the leading enterprise GenAI evaluation intelligence platform. Prior to founding Galileo, Atin played key engineering roles in influential AI technology built at Uber and Apple.
- Effective GenAI Evaluations: Mitigate Hallucinations and Ship Fast
![](/media/avatars/SF109285-Edit-Web_jG1ozPR.jpg)
Avik is a seasoned data scientist who has worked in multiple domains of machine learning. He loves coding in Python and writing elegant and scalable code.
- Reproducible Python projects using Nix
![](/media/avatars/Baran-Koseoglu-2_1up5SOb.png)
Senior Data Scientist @ Wise
- Fast, intuitive feature selection via regression on Shapley values
- Climbing the causal ladder for fun and profit
![](/media/avatars/headshot_ITJPZuc.jpg)
Bill Engels is a Principal Data Scientist with PyMC Labs, with 10 years of experience in industry and an MS in Statistics from Portland State University. He enjoys all phases of data analysis and is particularly interested in Bayesian modeling and Gaussian processes.
- Making Gaussian Processes Useful
![](/media/avatars/profile_photo_pl30uE7.jpeg)
Bing Wang is a Software Engineer at Flatiron, a healthcare company specializing in building big data databases for oncological research. She is passionate about developing data pipelines to enhance cancer care and is constantly exploring ways to further automate these processes.
Wang holds an M.S. in Computer Science from the University of Chicago, an M.Ed. in Developmental Psychology from Harvard, and a B.A. in Linguistics and Education from ECNU. Her interdisciplinary educational background blends social sciences and technology, fueling her interest in applying machine learning techniques to the analysis of human language.
- An Evaluation of Open-Source OCR Models for Japanese Medical Documents
![](/media/avatars/uw_profile_picture_g1exbN7.jpg)
Data scientist with 10+ years of experience spanning academia and industry. Published research in prestigious scientific journals and developed end-to-end AI products for startups and global companies. Passionate educator contributing to data training programs as a professor and consultant.
- PyTorch Workflow Mastery: A Guide to Track and Optimize Model Performance
- Making Gaussian Processes Useful
CEO of PySheets
map("ex-{}".format, ["Google", "Uber", "IBM", "Morgan Stanley", "Bank of America", "JP Morgan"]
- PyScript - Writing a Python application in the browser
Dr. Chris Rackauckas is the VP of Modeling and Simulation at JuliaHub, the Director of Scientific Research at Pumas-AI, Co-PI of the Julia Lab at MIT, and the lead developer of the SciML Open Source Software Organization. For his work in mechanistic machine learning, his work is credited for the 15,000x acceleration of NASA Launch Services simulations and recently demonstrated a 60x-570x acceleration over Modelica tools in HVAC simulation, earning Chris the US Air Force Artificial Intelligence Accelerator Scientific Excellence Award. See more at https://chrisrackauckas.com/. He is the lead developer of the Pumas project and has received a top presentation award at every ACoP in the last 3 years for improving methods for uncertainty quantification, automated GPU acceleration of nonlinear mixed effects modeling (NLME), and machine learning assisted construction of NLME models with DeepNLME. For these achievements, Chris received the Emerging Scientist award from ISoP.
- Open Source Component-Based Modeling with ModelingToolkit
![](/media/avatars/ChristopherHeadshot_W26X1NC.jpg)
Christopher is a computer science Ph.D. student from the University of Waterloo, specializing in artificial intelligence. Christopher is a member of the Computational Health Informatics Lab (CHIL), a Consultant, AI Research & Health Insights at Gluroo Imaginations Inc., and co-founder of the Blood Glucose Control AI Design Team.
Christopher's research focuses on developing AI systems for aiding and supporting decision making in the management of diabetes.
LinkedIn: https://www.linkedin.com/in/christopherrisi/
BGC AI Design Team: https://github.com/RobotPsychologist/bg_control/wiki/About-Us
- skchange & sktime – time series anomaly detection, changepoint detection, segmentation
![](/media/avatars/IMG_2051_GQi1kqA.jpeg)
I am a quantitative modeling senior associate at JPMorgan and I hold a PhD in Economics.
Feel free to connect me on LinkedIn: https://www.linkedin.com/in/chuxin-liu/
- Build Your Own Transformer (90 minutes)
Daniel is a Lecturer at the University of British Columbia and Data Science Educator at Posit, PBC. He believes that data science artifacts are useful when the information can be shared across stakeholders, and enjoys learning and teaching the tools that enable deploying and sharing these data products.
- Tips to Level-Up Your Shiny for Python Applications
![](/media/avatars/yo_transparente_c5fr2qJ.png)
As many people, I have several roles:
As a computer science that love programming, I am interested in programming in different languages (like C++, Java, ....), and I particularly love Python, and another interesting growing programming languages (like Rust or Julia). I define myself also as a Linux user, this is the only OS in my computers for more than 15 years. Also, I am a believer in Free Software (actually, I was for years the secretary of Free Software Office at the University of Cadiz). I usually give talks in PyData in my country, and I also I have participated in several JuliaCon conferences.
Since a professional side, I am Assistant Professor at the University of Granada, Spain, in Computer Science. I research in Artificial Intelligence using Metaheuristics for optimization (I have won two international competitions) and also in Neuroevolution, combining Metaheuristics with Deep Learning, with an index-h of 31, and be member of the 2% most influencer researcher in Stanford's list. I have directed two thesis. I have participated in several research projects involving Machine Learning. i am currently co-leading a General Purpose Artificial Intelligence project, a €120K Knowledge Generation Project, funded by the Ministry of Science, Innovation and Universities of Spain.
- Discover the Julia Machine Learning Ecosystem: A Comprehensive Overview
![](/media/avatars/profile_picture_4dDks9D.png)
I am currently working at Roche as Senior Data Scientist, I have a deep passion for elevating Python code quality and enhancing its role within the pharmaceutical industry. I am also actively engaged in streamlining automation workflows for both R and Python packages delivery.
- Enabling Multi-Language Programming in Data Engineering Workflows with the Snakemake Framework
Results-oriented Developer Relations professional with 20 years in open source, databases, devops, payments and AI, and now 1 year in generative AI. Proven ability to build and manage vibrant, engaging developer communities, drive developer adoption of cloud products, and foster strategic partnerships within the developer ecosystem. Skilled in community building, presenting technical topics, communication and content creation. Passionate about developer experience and success.
- Preparing Data for LLM Applications Using Data Prep Kit
![](/media/avatars/aac89b87ec7d19e3e71417ad9f16b0d0_kphWkJ5.jpeg)
Dr. Hongxia Yang has published over 100 papers in top-tier conferences and journals, and holds more than 50 patents. She has over 15 years of experience as an AI scientist, and specializes in large-scale machine learning, data mining, and deep learning. Currently a professor at HK Polytechnic University, she previously held AI scientist roles at IBM T.J. Watson research center, Yahoo! Inc, Alibaba Group, and ByteDance US. She earned her PhD from Duke University and her B.S. from Nankai University.
- Keynote: Collaboration and Evolution of Foundation and Specialized Models
![](/media/avatars/jeroenjanssens-headshot-2021_LeMwIah.png)
Jeroen Janssens, PhD, is a polyglot data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. Jeroen is passionate about open source and sharing knowledge. He is the author of Data Science at the Command Line (O’Reilly, 2021) and is currently writing Python Polars: The Definitive Guide (O’Reilly, 2025). Every now and then he blogs at https://jeroenjanssens.com.
- KEYNOTE - Embrace the Unix Command Line and Supercharge Your PyData Workflow
![](/media/avatars/duarte_mlfCZZs.jpg)
I'm a technologist, born and raised in sunny Portugal, now based in Copenhagen. My work lies in the intersection of Machine Learning, Data, Software Engineering, and People. I'm in love with Technology, and how it can improve people's lives.
In the past, I've worked in Consumer Electronics, Public Institutions, Big Three Management Consulting, and Startups. The common thread? Solving problems end-to-end.
- Panel: The Dashboard That Grew - A Scaling Saga
![](/media/avatars/Aletsch_square_htvpc0s.jpg)
Dr. Egor Kraev has been applying machine learning to real-world problems since last century, including economic and human development data analysis for nonprofits in the US, the UK, and Ghana, and 10 years as a quant, solutions architect, and occasional trader at UBS then Deutsche Bank.
Following last decade's explosion in AI techniques, Egor became Head of AI at Mosaic Smart Data Ltd, and for the last four years is bringing the power of AI to bear at Wise, in a variety of domains, from fraud detection to trading algorithms and causal inference for A/B testing and marketing, and now in multiple GenAI projects across the company.
In addition to having taken the Data Science team at Wise from an idea to a well-structured team of over 30 people, Egor is the founder of a startup, motleycrew.ai, aiming to take multi-agent AI systems to the next level of usability and power.
- Fast, intuitive feature selection via regression on Shapley values
![](/media/avatars/headshot_j5NkTD5.jpg)
Elijah has always enjoyed working at the intersection of math and engineering. More recently, he has focused his career on building tools to make data scientists and researchers more productive. At Two Sigma, he built infrastructure to help quantitative researchers efficiently turn ideas into production trading models. At Stitch Fix he ran the Model Lifecycle team — a team that focuses on streamlining the experience for data scientists to create and ship machine learning models. He is now the CTO at DAGWorks, which aims to solve the problem of building reliable AI systems through open source software. In his spare time, he enjoys geeking out about fractals, poring over antique maps, and playing jazz piano.
- Build Production Ready AI Agents with Burr
![](/media/avatars/mlw2_zNCr5pd.jpg)
Evan Wimpey is an analytics professional and stand-up comedian. Yes, dreams really do come true! With a background in statistics and economics, Evan has spent years analyzing data and delivering statistical models. Since the topic can be quite dry and technical, Evan has realized that humor is often the most effective way to make analytics more accessible...or at least the most fun way. He performs data science comedy at conferences and events around the world. With his book, Predictable Jokes, Evan hopes to continue sharing laughs and making analytics available to all.
- Python is a Joke!
![](/media/avatars/eyalgruss_JZdagUp.jpg)
Dr. Eyal Gruss - Code/media/text artist, algorithms researcher, teaches computational creativity at the Holon institute of Technology. https://eyalgruss.com
- Let our optima combine!
![](/media/avatars/bayes_eyal_mRHYLfs.jpeg)
👋 Hi I'm Eyal. My superpower is simplifying the complex and turning data to ta-da!
I'm an Ex-cosmologist turned data scientist with over 15 years experience in solving challenging problems. I am motivated by intellectual challenges, highly detail oriented and love visualising data results to communicate insights for better decisions within organisations.
My main drive as a data scientist is applying scientific approaches that result in practical and clear solutions. To accomplish these, I use whatever works, be it statistical/causal inference, machine/deep learning or optimisation algorithms. Being result driven I have a passion for facilitating stakeholders to make data driven decisions by quantifying and communicating the impact of interventions to non-specialist audiences in an accessible manner.
My claim for fame is that between 2004-2014 I lived in four different continents within a span of a decade, including three tennis Grand Slam cities (NYC, Melbourne, London).
- 🧠🧹 Causality - Mental Hygiene for Data Science
![](/media/avatars/FrancescAlted-photo_1xYhCHt.jpeg)
I am a curious person who studied Physics and Math when I was young. Through the years, I developed a passion for handling large datasets and using compression to enable their analysis using regular hardware that is accessible to everyone.
I am the CEO of ironArray SLU and also leading the Blosc Development Team, and currently interested in determining, ahead of time, which combinations of codecs and filters can provide a personalized compression experience. This way, users can choose whether they prefer a higher compression ratio, faster compression speed, or a balance between both.
As an Open Source believer, I started the PyTables project more than 20 years ago. Currently, and after 25 years in this business, I am the proudly owner of two prizes that mean a lot to me:
- 2023: NumFOCUS Project Sustainability Award
- 2017: Google’s Open Source Peer Bonus
You can know more on what I am working on by reading my latest blogs.
- Mastering Large NDArray Handling with Blosc2 and Caterva2
![](/media/avatars/conference_pp_Y87OkOG.png)
I am a Data Scientist who loves deep learning, MLOps, and working with data. I enjoy participating in conferences and competing in Machine Learning challenges to continually improve my skills. I also love traveling and visiting new places around the world.
I began my career as a Telecommunications Engineer with a background in Statistical Signal Processing, pretty tough stuff! After working for 3 years in university research and industry R&D, I joined AgileLab, an Italian consulting company with the mission of "elevating the data engineering game and empowering companies to shape their future around data."
I believe that successful data science projects should be built on solid software engineering and data engineering practices to ensure effectiveness and reliability.
- Deep Learning in Energy Management: Non-Intrusive Load Monitoring for IoT Devices
Core developer and founder of sktime.
Director of the German Center for Open Source AI Software.
- skchange & sktime – time series anomaly detection, changepoint detection, segmentation
![](/media/avatars/photo_pro_small_Wl1H3Z6.jpg)
Guillaume Dalle (https://gdalle.github.io/) is a postdoctoral researcher at EPFL who specializes in machine learning and combinatorial optimization. He is a prolific contributor to the Julia package ecosystem, especially for automatic differentiation and graph theory.
- Automatic differentiation, a tale of two languages
![](/media/avatars/_fkeu_23_hannes_muhleisen_09_mQxA68D.jpg)
Prof. Dr. Hannes Mühleisen is a creator of the DuckDB database management system and Co-founder and CEO of DuckDB Labs. He is a senior researcher at the Centrum Wiskunde & Informatica (CWI) in Amsterdam. He is also Professor of Data Engineering at Radboud University Nijmegen.
- Changing Data With Confidence using DuckDB
![](/media/avatars/Ms.-SAHS-Sudasinghe_Pu2gL60.jpg)
Hansila Sudasinghe is a dedicated data science professional and educator with a robust background in computing and information systems. She completed her Master of Data Analytics at the Faculty of Graduate Studies, University of Kelaniya, following a Post-Graduate Diploma in Information Technology from the University of Colombo School of Computing. Hansila holds a B.Sc. (special) in Computing & Information Systems from Sabaragamuwa University of Sri Lanka, where she graduated.
With extensive teaching experience across various academic institutions, Hansila currently working as a Lecturer in Computer Science at the Edith Cowan University – Sri Lanka Campus. She has completed a Certificate Course in Teaching in Higher Education and earned badges for her contributions to professional development. Hansila is actively involved in curriculum development and research, contributing to projects that integrate technology and data science with practical applications.
Her expertise spans multiple programming languages and tools, including Python, SQL, Java, and Power BI, with a focus on data analysis, web development, and business intelligence. Recent projects include analyzing social media sentiments on global economic issues and developing data-driven solutions for the apparel industry. She has also co-authored research on topics ranging from obesity prediction models to mobile banking adoption.
An advocate for continuous learning and collaboration, Hansila has actively participated in workshops, technical committees, and community events. She aims to bridge the gap between academia and industry by equipping students and professionals with the skills needed to thrive in a data-driven world.
- PYDATA Bloom Framework: A Multidisciplinary Approach to Data Science in University Education
Hendrik Makait is a data and software engineer building systems at the intersection of large-scale data management and machine learning. Currently, he works as an Open Source Engineer at Coiled, maintaining and improving Dask and its distributed execution engine. His focus areas include P2P shuffling, which allows shuffling large data at constant memory, and observability with Dask to help users understand and optimize their workloads.
- Dask ❤️ Xarray: Geoscience at Massive Scale
![](/media/avatars/f3a5e6ed-adec-426b-b425-a62be14ad195_wHJ0MBP.png)
Hussein Jawad is a Senior Data Scientist specializing in NLP, holding degrees from École Polytechnique and Télécom Paris. Based in Paris, he possesses a foundation in programming, statistical modeling, and MLOps.
Currently, he works on the development team of MAPIE while delivering innovative solutions at Capgemini Invent. With publications on LLM security and achievements in global competitions, he combines technical expertise with cross-functional collaboration.
- Boosting AI Reliability: Uncertainty Quantification with MAPIE
Ian is a Chief Data Scientist, has co-founded and built the annual PyDataLondon conference raising $100k+ annually for the open source movement along with the associated 13,000+ member monthly meetup. Using data science he's helped clients find $2M in recoverable fraud, created the core IP which opened funding rounds for automated recruitment start-ups and diagnosed how major media companies can better supply recommendations to viewers. He gives conference talks internationally often as keynote speaker and is the author of the bestselling O'Reilly book High Performance Python (3rd edition for 2025). He has over 25 years of experience as a senior data science leader, trainer and team coach. For fun he's walked by his high-energy Springer Spaniel, surfs the Cornish coast and drinks fine coffee. Past talks and articles can be found at:
- https://ianozsvald.com/
- https://www.linkedin.com/in/ianozsvald/
- https://notanumber.email/
- https://github.com/ianozsvald/
- https://twitter.com/ianozsvald
- Valuable LLM lessons learnt on Kaggle's ARC AGI Challenge
![](/media/avatars/crop2_cvyAo7o.jpg)
Irina is an ML Engineer, specialised in Computer Vision and NLP, and seasoned in different industries: from optical biopsy systems in France to Augmented Reality apps in German startups to leading AI Engineering teams at Siemens Mobility. She is now part of the journey of mozilla.ai to add transparency and safety to Generative AI through Open Source Software.
Even more than waking up Skynet, she's more worried about Natural Intelligence and its decisions over our data.
- Trustworthy LLMs: Vibe checks are not all you need
![](/media/avatars/profile-2024_oPAXUey.png)
Jacob Tomlinson is a senior software engineer at NVIDIA. His work involves maintaining open source projects including RAPIDS and Dask. He also tinkers with kr8s in his spare time. He lives in Exeter, UK.
- The art of wrangling your GPU Python environments
- GPU development with Python 101
![](/media/avatars/jeff_bezanson_Ky8HsL5.jpeg)
Jeff is a co-creator of the Julia language and co-founder of JuliaHub.
- Statically-Compiled Julia for Library Development
Jeremy is an experienced data, analytics and insight professional of 20 years, specialising in customer analytics, reporting automation, machine learning and data visualisation - with a passion in developing sophisticated data strategies and inspiring best in class analytics solutions, underpinned by the R environment. He has been coding in R since day one of his career and combines it with his love of community building and public speaking to share applications and motivate others to develop into coding specialists, by running community user groups across the UK, alongside EARL, the UK's annual premier R (and Python) conference.
- Shiny Policies: Dashboards to Aid British Government Decisions (using R).
Having spending most of his career as an academic mathematician, Joe made
the leap to full-time software development in 2022. An open-source
hobbyist for decades, he discovered Rust before it hit 1.0 and fell in love
(or, at least, infatuation). Although he is capable of getting things done,
Joe also likes to talk and learn about math, software, and the connections
between them. If you have an hour to spare, try asking him about the math
behind soap bubble clusters.
Joe spends most of his non-working hours shuttling his daughters around in
his bakfiets. People in Texas (where he lives) find this odd, but they seem
to understand when he points out that it's basically the bicycle equivalent
of an F-150.
- Evaluating RAGs: On the correctness and coherence of Open Source eval metrics
![](/media/avatars/IMG_0955_f91txKV.jpeg)
Win-Vector Principal Consultant and Trainer John Mount has a Ph.D. in computer science from Carnegie Mellon and over 15 years of applied experience in biotech research, online advertising, price optimization and finance. He is one of the authors of the popular book "Practical Data Science with R", Manning, 2020 (now in its second edition).
- Solving Forecasting Problems in R and Python
John Sandall is the CEO and Principal Data Scientist at Coefficient.
His experience in data science and software engineering spans multiple industries and applications, and his passion for the power of data extends far beyond his work for Coefficient’s clients. In April 2017 he created SixFifty in order to predict the UK General Election using open data and advanced modelling techniques. Previous experience includes Lead Data Scientist at YPlan, business analytics at Apple, genomics research at Imperial College London, building an ed-tech startup at Knodium, developing strategy & technological infrastructure for international non-profit startup STIR Education, and losing sleep to many hackathons along the way.
John is also a co-organiser of PyData London, co-founded Humble Data in 2019 to promote diversity in data science through a programme of free bootcamps, and in 2020 was a Committee Chair for the PyData Global Conference. He is currently a Fellow of Newspeak House with interests in open data, AI ethics and promoting diversity in tech.
- Fairness Tales: How To Measure And Mitigate Unfair Bias in Machine Learning Models
Jon is a Machine Learning Engineer specialized in IoT systems. He has a Master in Data Science and a Bachelor in Electronics Engineering, and has published several papers on applied Machine Learning, including topics like TinyML, Wireless Sensor Systems and Audio Classification.
These days Jon is co-founder and Head of Data Science at Soundsensing, a leading provider for condition monitoring solutions for commercial buildings and HVAC systems.
He is also the creator and maintainer of emlearn, an open-source inference engine for microcontrollers and embedded systems.
- Microcontrollers + Machine Learning with MicroPython in 1-2-3
![](/media/avatars/IMG_5318_2_CGqmb8S.png)
Joseph Oladokun is a Data Scientist and Machine Learning Engineer with extensive experience across healthcare, finance, and software. Currently pursuing a Master's in Information Systems and Business Analytics at Iowa State University, Joseph has worked with companies like Asana, Helium Health, and RataFX, Autochek Africa, where he developed innovative business solutions using Machine learning and predictive analytics. His open-source project, Faustream, aims to make stream processing and machine learning integration easier. Joseph is passionate open source, and application of data to business problems.
- Bridging the Gap: Real-Time Predictive Analytics with Faustream
![](/media/avatars/Screenshot_2024-11-03_at_16.23.08_8lq4Y3k.png)
Data Scientist, developer, and educator with a passion for enabling developers to build great applications and turn data into meaningful insights and innovative products. With over the 10 years spent in Data Science and Developer Relations for AI and Web3 spaces, Justina has been focusing on empowering developers around he world to build better applications and products.
- Building an AI Travel Agent That Never Hallucinates
Kalyan is a Data and AI scientist with a background as a former data science and analytics manager, effectively balancing both academia and industry . He is a community leader and an active contributor to the Python, data science, and scientific communities.
- The Hidden Costs of Data Quality - Tackling Common Data Challenges in ML
Dr. Katrina Riehl is a Principal Technical Product Manager at NVIDIA supporting CUDA and Python. For over two decades, Katrina has worked extensively in the fields of scientific computing, machine learning, data science, and visualization. Most notably, she has helped lead initiatives at the University of Texas Austin Applied Research Laboratory, Anaconda, Apple, Expedia Group, Cloudflare, and Snowflake. She is an active volunteer in the Python open-source scientific software community and continues to serve on the Advisory Council for NumFOCUS.
- GPU development with Python 101
![](/media/avatars/WhatsApp_Image_2024-08-07_at_19.59.03_7b3cdcf8_8hMRwar.jpg)
Kristal Joi Wise is an innovative leader with a passion for leveraging data science and business strategy to drive transformation in organizations. As the Chief Transformation, Sales, and HR Officer at her current company in Ghana, Kristal plays a pivotal role in optimizing operations and fostering organizational growth through data-driven decision-making and agile methodologies. With certifications in Scrum, Business Analysis, and Neuro-linguistic Programming (NLP), Kristal excels in guiding her team toward achieving business objectives in dynamic environments.
Her recent work focuses on harnessing machine learning and data science to tackle real-world challenges, such as food security in Africa. Kristal is passionate about how technology can be applied to improve agricultural practices and strengthen local economies, particularly in areas most affected by climate change. She is currently expanding her knowledge and experience in predictive analytics, with the goal of using these tools to support agricultural innovation and economic development in Africa.
As a single mother living in Ghana, Kristal is also a role model for women and young girls aspiring to pursue careers in STEM fields. Her leadership journey, combined with her commitment to continuous learning and professional development, allows her to inspire others and create opportunities for women to break into the tech and business sectors.
As a first-time speaker at PyData Global, Kristal is excited to share her insights on how open-source data tools and machine learning can be applied to real-world problems like agricultural yield prediction and business optimization. She hopes to engage with the PyData community and contribute to the growing body of knowledge on using data science for social good.
- Harnessing Machine Learning to Improve Agricultural Resilience in Africa: A Practical Approach to Predicting Crop Yields
![](/media/avatars/Lu_UGKw6Vz.jpg)
Lu is a Database engineer at LanceDB. Lu builds distributed vector databases at LanceDB and integrates Lance with the big data ecosystem (Spark, Trino). She developed the distributed system Alluxio as its PMC maintainer. She's also a Data on Kubernetes Ambassador and Kubernetes community evangelist, bridging AI data infrastructure with cloud-native technologies.
- Bridging Big Data and AI: Empowering PySpark with Lance Format for Multi-Modal AI Data Pipelines
![](/media/avatars/leonie_hodel_headshot_8zgCeTq.jpg)
Leonie is a land system scientist and postdoctoral researcher at the Global Land Use and Environment Lab at the University of Wisconsin-Madison. With a background in bioinformatics, she recently earned her PhD from ETH Zurich in Switzerland. Her research is dedicated to understanding deforestation trends in tropical regions and evaluating the effects of both private and public conservation interventions. Leonie's work employs AI-driven large-scale geospatial analysis, complemented by qualitative methods, to explore and analyze complex land systems.
- Using AI to Spot Deforestation-related Cows on Satellite Images
![](/media/avatars/liam_profile_pic_0Ho0F5R.jpg)
Liam is Lead Data Scientist at Joulen where he builds time series forecasting pipelines for renewable energy management. He communicates about cutting-edge data science with over 10,000 followers on social media. Liam has been a Polars contributor focused on accessibility and documentation for new users. He also created the world's first online course in Polars and has taught over 3,000 learners to date on Udemy and is the Polars instructor on the O'Reilly platform.
- Build simple and scalable data pipelines with Polars & DeltaLake
![](/media/avatars/PyCon_1610_square_B8ARayS.jpg)
Machine Learning Engineer by day and Open Source maintainer by night, Luca is passionate about time-series. Feel free to reach out to me on LinkedIn for feedback and/or material! https://www.linkedin.com/in/lucabaggi/
- Foundational Models for Time Series Forecasting: are we there yet?
![](/media/avatars/me-white_riTGvTq.jpeg)
Maarten Breddels is an entrepreneur and ex-scientist mainly working with Python, C++, and Javascript in the Jupyter ecosystem. He is the creator of PyCafe, Solara, ipyvolume, and Vaex and Co-founder of Widgetti and PyCafe. His expertise includes fast numerical computation, API design, 3D visualization, and building data apps. He has a Bachelor's in ICT, a Master's, and Ph.D. in Astronomy, and he likes to solve real problems.
- Python Apps in the Browser made simple by PyCafe
![](/media/avatars/Maggie_Wolff_2024_5_-_Copy_YdwV0Dc.jpg)
Maggie Wolff is a Marketer turned Data Scientist, with two decades of experience spanning multiple industries including travel, e-commerce, corporate real estate, healthcare, and non-profits. She holds a Master of Science in Data Science from DePaul University and a Bachelor of Arts in Communication from Loyola University Chicago. Currently, she is a Data Scientist focused on product analytics with American Express Global Business Travel.
Maggie also serves as an ambassador for Women in Data Science Worldwide and helped organize the first MeasureCamp conference held in Chicago.
- Measuring the User Experience and the Impact of Effort on Business Outcomes
- Hands-on Multimodal AI Development with Pixeltable
Marco is the author of Narwhals, and also core contributor to Polars and pandas and works at Quansight Labs as Senior Software Engineer. He also consults and trains clients professionally on Polars. He has also written the first Polars Plugins Tutorial and has taught Polars Plugins to clients.
He has a background in Mathematics and holds an MSc from the University of Oxford, and was one of the prize winners in the M6 Forecasting Competition (2nd place overall Q1).
- Mentored contributions to Narwhals, a lightweight compatibility layer between dataframes!
![](/media/avatars/mark-headshot-3_WiYrNmI.png)
Dr. Mark Moyou Senior Data Scientist at NVIDIA, podcast host and conference director. At NVIDIA he works with enterprise clients on AI strategy and deploying machine learning applications to production. He is the host of AI Portfolio, Caribbean Tech Pioneers and the Progress Guaranteed Podcasts and runs the Optimized AI Conference.
- Understanding the end-to-end LLM training and inference pipeline
- akimbo: vectorized processing of nested/ragged dataframe columns
![](/media/avatars/martin_headshot_RbhAqGh_202KkOV.png)
Martin is an experienced computer science educator and open source software developer.
Martin creates educational content for Neo4j and supports developers in using graph technology to understand their data.
As a child he wanted to be either a Computer Scientist, Astronaut or Snowboard Instructor.
- GenAI Beyond Chat with RAG, Knowledge Graphs and Python
![](/media/avatars/MicrosoftTeams-image_1_--_rett_i_kamera_kvadratisk_dqyqPkQ.png)
Senior Research Scientist at the Norwegian Computing Center, Department of Statistics and Machine Learning. Interests: Anomaly detection, changepoint detection, sensor data.
https://nr.no/en/employees/martin-tveten/
- skchange & sktime – time series anomaly detection, changepoint detection, segmentation
![](/media/avatars/dogsds-round_yaE7IkP.png)
Marysia Winkels is a member of technical staff at Cohere, where she focusses on creating high-quality diverse data for the post-train data team. Besides that, she is also chair for PyData Amsterdam, and volunteers at CorrelAid - a #data4good organisation that matches data enthusiasts to non-profits that benefit from their help.
- The Data That Shapes Foundational LLMs
Matthew Powers is a Developer Advocate.
He focuses on blogging, social media, coding, and community development for DataFusion, Polars, Spark, Delta Lake, and other related Data Engineering technologies.
He tries to teach concepts in an easily digestible manner and focus on core concepts.
He likes separating usage guides from theory, so learners that just want to get their job done are not bogged down with the theory.
- New Features in Apache Spark 4.0
![](/media/avatars/headshot_kUFbF4F.jpeg)
Hi! Melody is an intern at NVIDIA on the RAPIDS Cloud Deployment Team. She is currently a senior studying Statistics & Machine Learning, CS, and Human-Computer Interaction at Carnegie Mellon University, where she also enjoys exploring technology for social and community impact. She is super excited to be attending PyData and getting involved in the open source community!
- The art of wrangling your GPU Python environments
![](/media/avatars/_40A9702_pfuk1UA.jpg)
Michael started out as a chemist and electron microscopist, but went to the dark side of build infrastructure after experiencing the pain of trying to distribute the tools he wrote. He's been involved with Conda and Conda-forge for quite a while, and now works on the RAPIDS build infrastructure team at NVIDIA. He has a strange obsession with encoding compatibility into package management systems, and dreams of a world where no user ever has to wonder why they are missing symbols. Metadata is his love language.
- Going Plaid: Striving for Speed of Light in CI pipelines
Nathan Colbert is an ML professional with 5 years experience building, deploying, and owning end-to-end ML Systems. Nathan works at Peacock as a Senior Manager of ML Architecture where he is focused on accelerating ML delivery across the organization.
- From Inference to Features: Build a Core ML Platform from Scratch
![](/media/avatars/nicola1_dL6VBtT.jpg)
Nicola Rennie is a Lecturer in Health Data Science based within the Centre for Health Informatics, Computing, and Statistics at Lancaster Medical School. Her research interests lie in understanding how to effectively communicate complex quantitative ideas in an accessible way e.g. to clinicians, patients, the public, and students. Nicola also has experience in data science consultancy, and collaborates closely with external research partners. She can often be found at data science meetups, presenting at conferences, and is the R-Ladies Lancaster chapter organiser.
Nicola has roles in several organisations including as committee members of the R-Ladies Global Team and the RSS Teaching Statistics Section Group, as co-lead of the Statistics Software Special Interest Group at the RoSE (Researchers of Statistics Education) Network, and as a member of the Editorial Board of Significance magazine. She is also one of the Royal Statistical Society’s 2024-2025 William Guy Lecturers, and will be recording and delivering her talk aimed at 11-16 year olds on the topic of using data and statistics to shape decision making in medicine and healthcare.
- Practical Techniques for Polished Visuals with Plotnine
![](/media/avatars/giso_profile_picture_Ma9DAmV.png)
Passionate in Data Science and Machine Learning, involved in projects from ETL through modeling to deployment.
Some of the projects in which I took part are:
- Implementation of custom model for image recognition
- Modelization of physical processes to optimize performances
- Building ML services infrastructure leveraging Microsoft Azure Cloud services such as Azure Machine Learning Workspace, Azure Databricks and Azure DevOps
As side project I built www.whilemodeltrains.com, a little app that serves data related blog posts, presented 3 at a time.
I write about ML (and other stuff) on my blog at www.nicologiso.com.
- Image Recognition for safety on the factory floor
![](/media/avatars/PyData_L1YVMAZ.jpeg)
Nompumelelo is a certified cesiumjs developer, based in London and working for Immersionn as an XR developer. She is also pursuing a BSc. undergraduate degree in Computer Sciences at the University of London.
- 3D geospatial data visualization using Python and Cesiumjs
![](/media/avatars/WhatsApp_Image_2024-10-25_at_2.35.44_PM_tAwFKve.jpeg)
Noor Aftab is the Global Program Lead at Amazon Web Services (AWS), where she leads the Strategic Customer Program at Amazon S3, overseeing the largest cloud, data, and analytics workloads globally. With her foundation as a data scientist, Noor brings deep technical knowledge and strategic vision to help organizations optimize their data infrastructure and leverage AI-driven solutions on a massive scale. Her leadership in managing complex, mission-critical cloud operations has positioned her as a leading figure in the fields of cloud technologies and data science.
Noor has spoken at 13 global locations, sharing insights on how AI and data can transform industries and promote inclusivity in technology. Her presentations focus on advancing AI adoption, fostering diversity in tech, and driving innovation through cloud solutions.
She is also the founder of the International Women Economic Council (IWEC), a digital platform that provides networking, mentoring, and global profiling opportunities for women worldwide. IWEC empowers women to break glass ceilings and close the wage gap by connecting them with mentors and industry leaders, helping them achieve greater success in their careers.
Noor’s achievements have earned her international recognition, including the Australia Alumni Excellence Award and the Asia Pacific HRM Congress Award for her contributions to business leadership and women's empowerment. Her work has attracted media attention from outlets such as BBC, Martha Vineyard Times, and Hindustan Times, underlining her global impact in advancing diversity and inclusion in the tech industry.
- The Missing 78%: How Women in AI & Data Can Complete the Future of Innovation
Nour leads the Generative AI technical group at Modus Create. She has a PhD in Machine Learning and has worked on Machine Learning, Data Science and Data Engineering problems in various domains, both inside and outside Academia.
- Evaluating RAGs: On the correctness and coherence of Open Source eval metrics
![](/media/avatars/pxn_2015_VSUUAJq.png)
Paco Nathan leads DevRel for the Entity Resolved Knowledge Graph practice area at Senzing.com and is a computer scientist with +40 years of tech industry experience and core expertise in data science, natural language, graph technologies, and cloud computing. He's the author of numerous books, videos, and tutorials about these topics.
Paco advises Kurve.ai, EmergentMethods.ai, KungFu.ai, DataSpartan, and Argilla.io (acq. Hugging Face), and is lead committer for the pytextrank
and kglab
open source projects. Formerly: Director of Learning Group at O'Reilly Media; and Director of Community Evangelism at Databricks.
- Catching Bad Guys using open data and open models for graphs
![](/media/avatars/IMG_2221_hydXAdg.jpeg)
Applied AI Scientist at JP Morgan Chase
MS CS @ UMass Amherst ‘22
- Holistic Evaluation of Large Language Models: From References to Human Judgment
Pascal is Head of Research Technology for Cubist Systematic Strategies.
- Leveraging CSP for Live Inference
![](/media/avatars/Patrick_Deziel_SwKDUdr.jpg)
Patrick Deziel is a Python and Go developer and machine learning specialist. He has extensive experience building practical machine learning models and integrating them into existing applications. Patrick currently works at Rotational Labs where he develops custom LLMs and AI/ML-powered APIs for business use cases. In his free time, he enjoys rock climbing and contributing to open source.
- Putting the data science back into LLM evaluation
- Keynote: Do Python and Data Science Matter in Our AI Future?
![](/media/avatars/qq_bJRxqZq.png)
Before Pixeltable, Pierre worked at Confluent after his company (Noteable) was acquired. Led Amazon’s notebook initiatives (Internally & AWS SageMaker). Prev. worked at Amazon Core AI/ML, helped launch Amazon’s online car leasing store in the EU, and worked on diverse ML projects such as Amazon’s Data Quality Framework (Deequ).
- Hands-on Multimodal AI Development with Pixeltable
![](/media/avatars/prao_1_3B1vTiQ.png)
Prashanth is an AI engineer at Kùzu based in Toronto. In recent years, he's focused heavily on data modeling and engineering using relational, graph and vector databases that power a variety of machine learning and AI applications. In his spare time, he enjoys engaging with the Python/Rust community and blogging @ thedataquarry.com.
- Graph RAG: Bringing together graph and vector search to empower retrieval
![](/media/avatars/quan_Cy2NBeQ.jpg)
Quan is a Python programmer and machine learning enthusiast. He is interested in solving decision-making problems that involve uncertainty. Quan has authored several books on Python programming and scientific computing. He is currently working as a postdoctoral research associate at Princeton University, where he does research on machine learning methods for scientific discovery.
- Cost-effective data annotation with Bayesian experimental design
![](/media/avatars/pictures_st8Zhvb.jpg)
Riya is a Data and Applied Scientist at Microsoft who specializes in NLP and machine learning. She holds a Master’s degree in CS from the University of Massachusetts, Amherst, which she completed in May 2022. Before joining Microsoft’s US team, she worked as a Data Engineer in India. She is passionate about building data and AI-driven products and solutions that can benefit people and society. She enjoys hiking, dancing and working out in her spare time.
- Holistic Evaluation of Large Language Models: From References to Human Judgment
![](/media/avatars/headshot_ygCzbhY.jpg)
Robin Linacre is a data scientist at the UK Ministry of Justice and the lead author of Splink, a Python library for record linkage and deduplication at scale
- Rapid deduplication and fuzzy matching of large datasets using Splink
![](/media/avatars/rodrigo_x70UEjM.jpeg)
Rodrigo has always been fascinated by problem solving and that is why he picked up programming – so that he could solve more problems. He also loves sharing knowledge, and that is why he spends so much time writing articles in his blog mathspp.com/blog, writing on Twitter @mathsppblog, and giving workshops and courses.
Now, Rodrigo also channels this passion into his role at Polars.
His main areas of scientific interest are mathematics (numerical analysis in particular) and programming in general (with a preference for the Python and APL languages), but Rodrigo also enjoys reading fantasy books, watching silly comedy movies and eating chocolate.
- Understanding Polars data types
![](/media/avatars/1696629453296_jph1jtb.jpeg)
Ryan is SVP of Technology at Boclips, an ed-tech enabling the use of video in education.
With a PhD in astronomy, Ryan has worked across data-centric roles in various startups—from data science and data engineering to leadership roles. He is now responsible for Data, Engineering and Product at Boclips. Ryan's expertise spans machine learning, natural language processing, data pipelines, and large language models, with a core focus on getting data science delivered.
Ryan frequently shares insights on leadership and data on LinkedIn https://www.linkedin.com/in/ryanvarley/.
- Let's get you started with asynchronous programming
![](/media/avatars/photo_hg6RfzF.jpeg)
Sara Zanzottera is Lead AI Engineer at Kwal working on voice agents and conversation analysis with LLMs. Before joining Kwal she was a core maintainer of Haystack, one of the most mature open-source RAG frameworks, and lead the design and implementation of its 2.0 version. She started her career at CERN as a Python software engineer on the particle accelerator’s control systems.
- Building LLM Voice Bots with Open Source Tools
![](/media/avatars/Saranjeet_Kaur_Bhogal_G3SSgwq.jpg)
Saranjeet is a Research Software Engineer at Imperial College London. She is also a Software Sustainability Institute Fellow 2023, has a Masters degree in statistics from the University of Pune, and is a Technical Writer for the R Development Guide. She has been involved with software engineering communities throughout her career and has been selected in open source programs including Google Summer of Code 2020, Code for Science and Society's Digital Infrastructure Incubator 2021, Google Season of Docs 2022, and as a Subject Matter Expert for the Open Science Tools and Resources Module of NASA TOPS. In 2021, she participated in the Open Life Science program (cohort-4), during which she co-founded the Research Software Engineering (RSE) Asia Association. For her work in the RSE community, she was awarded the RSE Impact Award 2022 at the Inaugural Community Awards by the Society of RSE.
- Empowering New Contributors: The Evolving Role of the R Development Guide
![](/media/avatars/slack-img_hiZBkQq.jpeg)
Hello.
I am an engineer at Outerbounds . I work on infra so our customers can work on AI/ML.
- Navigating Cloud Expenses in Data & AI: Strategies for Scientists and Engineers
![](/media/avatars/d2bf1b5a-2885-4deb-a41d-a99f39066ba5_vY7UVfb.png)
Independent Consultant, Data Scientist & Open Science Advocate.
I lead with a clear focus on the Big picture, turning Data into powerful tools for decision-making and discovery.
👩🏽💻 More about my work
- The LEGO Approach to designing PyData Workflows
![](/media/avatars/sam_ydTLG82.jpg)
Sergey Maydanov is a Senior Software Engineering Manager at NVIDIA leading the nvmath-python product engineering. Throughout his career, Sergey led math library projects targeting CPU and GPU accelerated transcendental and special functions, statistical and machine learning algorithms, and random number generators.
- The nvmath-python: Bringing NVIDIA math libraries to Python scientific community
![](/media/avatars/WhatsApp_Image_2024-10-24_at_16.37.43_w3nHv59.jpeg)
Sheetal is a Senior Applied Scientist at Etsy and has six years of experience in data science and machine learning, with a career that spans Asia and Europe and active engagement with the global data science community. She worked at Amazon as an Applied Scientist in London, focusing on personalization, and as a Machine Learning Engineer at JP Morgan Chase in Hong Kong. She holds a master’s degree in Data Science and AI from a dual degree program in the Netherlands and Finland, during which she published papers at top-tier conferences. Currently, she leads a paper reading group in the Northeast, facilitating discussions with fellow data professionals.
Sheetal is deeply passionate about fostering and growing women-focused communities in tech. As a WiDS (Women in Data Science) ambassador, she actively supports initiatives that empower women in the field. Her dedication to community impact was recognized with the Social Impact Award in Germany.
- Build Your Own Transformer (90 minutes)
![](/media/avatars/cropped_me_CTKSwTi.jpeg)
Shefali is completing her MS in Applied Statistics at Columbia University and brings over 3 years of data science and analytics experience across education, consulting, and AdTech. Her notable work includes leading data-driven initiatives that impacted 90,000 schools at BCG, optimizing ad performance at Media.net (one of the top 5 largest AdTech companies worldwide by market cap), and developing cloud-based analytics solutions for U.S. non-profit institutions.
- Build Your Own Transformer (90 minutes)
![](/media/avatars/IMG_0713_copy_6yjVgoW.jpg)
Shekhar is deeply passionate about open source software and actively contributes to various projects, including SymPy, Ruby gems like daru and daru-view (which he authored), Bundler, NumPy/SciPy, Apache Projects like Druid, Kafka .
He successfully completed Google Summer of Code in 2016 and 2017 and has served as an admin for SciRuby, mentoring multiple organizations.
Shekhar has spoken at prominent conferences such as RubyConf 2018, PyCon 2017, ApacheCon 2020, and Community Over Code 2024, as well as numerous regional meetups. Currently, he works at Apple as a Software Development Engineer.
- Building a Real-Time Data Pipeline with Flink, Druid, and Python
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.
He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and has also been a MLH Fellow. He is actively involved in community work as well. He is a TensorflowJS SIG member, Mentor in OpenMined and CNCF Service Mesh Community, SODA Foundation and has given talks at various conferences like Github Satellite, Voice Global, Fossasia Tech Summit, TensorflowJS Show & Tell.
- Streamlining AI development and Deployment with KitOps
![](/media/avatars/IMG_20230820_1914113_k6gNIt7.jpg)
I'm a data scientist at Intuit in California, USA and I work on the anomaly detection capability that tracks authentication and business health metrics at Intuit. I was previously building NLP models at GoDaddy, but I enjoy working with data in general. I'm a Python enthusiast and enjoy sharing my learnings with the community - I've previously presented at the Grace Hopper Conference, PyCon US, EuroPython, and GeoPython. When not opposite a screen, I can be found frolicking in nature and exploring new trails.
- Realtime Time Series Anomaly Detection in Production
I am a Data Scientist at Sixt SE, where I specialize in marketing technology projects designed to optimize campaign return on ad spend (ROAS), identify high-value customers, and predict churn. With four years of professional experience in the field, I leverage advanced analytical techniques to drive data-informed decision-making and enhance marketing strategies. I hold a degree in Computer Science from the University of Passau, with a particular focus on applying transfer learning methodologies to the LegalTech sector. My interdisciplinary background equips me with a unique perspective on integrating data science principles into diverse business contexts.
- Automating SEA Retargeting for Smarter Audience Engagement and Higher Conversions
![](/media/avatars/sonnguyensquare_9feVZPf.jpg)
Son The Nguyen is a Ph.D. student in Management Information Systems (MIS) at the University of Illinois Chicago, where he is guided by Professor Theja Tulabandhula. Son specializes in human-AI collaboration, with a focus on enhancing both human and model performance. His research bridges AI theory and practical applications, emphasizing AI safety, alignment, and optimizing interactions between humans and AI to harmonize their collaboration.
- Improve LLMs Alignment with Complete and Robust Preference Data
![](/media/avatars/IMG_20230531_1746383_9Ogx4fs.jpg)
Sonam is the creator of the open-source library called Embed-Anything, which helps to create local and multimodal embeddings and stream them to vector databases, it’s built in rust and thus it’s more greener and efficient. She works as the GenerativeAI Evangelist at Articul8, spun-off of Interl, Articul8 provides verticle genAI services to enterprise.
- Vector Streaming: The Memory Efficient Indexing for Vector Databases
![](/media/avatars/sujee2-small_bXnpwJ0.jpg)
Sujee Maniyam is an expert in Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He is passionate about developer education, fostering community engagement. Sujee has led numerous training sessions, hackathons, and workshops. He is also an author, open source contributor and frequent speaker at conferences and meetups.
portfolio • Linkedin • Github
- Preparing Data for LLM Applications Using Data Prep Kit
![](/media/avatars/photo_kMNsQfv.jpeg)
Thibault Cordier is a Data and Research Scientist at Capgemini Invent, where he is a member of the Lab Invent team in France and serves as the technical leader of the MAPIE project.
Prior to joining the research team at Capgemini Invent, he earned his PhD in Computer Science in 2023 at Avignon University.
Up to now, his research has focused on distribution-free inference and conformal prediction, with applications in computer vision, natural language processing, and time series analysis.
- Boosting AI Reliability: Uncertainty Quantification with MAPIE
![](/media/avatars/profile_pzTpRkB.png)
Tim Swena is the team lead for BigQuery DataFrames and a contributor to pandas, geopandas, ibis, and many other projects in the PyData ecosystem.
- Python + BigQuery + DataFrames: Hands on with scalable "serverless" analysis, ML, and AI
![](/media/avatars/headshotTimSpann_8RUu7li.png)
Tim Spann is a Principal. He works with Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Milvus, Generative AI, HuggingFace, Python, Java, Apache NiFi, Apache Spark, Big Data, IoT, Cloud, AI/DL, Machine Learning, and Deep Learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal Developer Advocate at Zilliz, Principal Developer Advocate at Cloudera, Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Senior Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in Computer Science.
- It's in the Air Tonight. Sensor Data in RAG
![](/media/avatars/HOCKING-square-lowres_T2saWu7.jpg)
A Berkeley-educated California native, Toby Dylan Hocking received his PhD in mathematics (machine learning) from Ecole Normale Superiere de Cachan (Paris, France) in 2012. He worked as a postdoc in Masashi Sugiyama’s machine learning lab at Tokyo Tech in 2013, and in Guillaume Bourque’s genomics lab in McGill University, Montreal, Canada (2014-2018).
In 2018-2024 he was a tenure-track Assistant Professor at Northern Arizona University, and since 2024 he is a tenured Associate Professor at Université de Sherbrooke, where he directs the LASSO research lab (Learning Algorithms, Statistical Software, Optimization). Since 2024, Toby is also an Associate Academic member at Mila - Quebec Artificial Intelligence Institute.
He has authored dozens of R packages, and has published 40+ peer-reviewed research papers on machine learning and statistical software. He has mentored 30+ students in research projects, as well as another 30+ open-source software contributors with R Project in Google Summer of Code.
- Using and contributing to the data.table package for efficient big data analysis
![](/media/avatars/IMG_1644_qYq0dlV.jpeg)
Tony Ojeda is an accomplished analytics executive and thought leader with over 20 years of experience in data science, machine learning, and generative AI. As Chief Data Scientist at Fulcrum Analytics, he spearheads the development of cutting-edge AI-powered solutions that transform how businesses operate and make decisions in industries such as healthcare, financial services, marketing, and customer service.
Tony is the author of Applied Text Analysis with Python and the Practical Data Science Cookbook. A frequent speaker at PyData and other industry events, he is committed to advancing AI innovation and empowering data professionals to harness the full potential of AI and automation, enabling organizations to derive deeper insights, streamline operations, and make data-driven decisions more effectively.
- Generative AI + Python: Unlocking Efficiency, Personalization, and Insight
![](/media/avatars/tun-lajos-small_iqbuw3L.jpg)
Tun Shwe is the VP of Data at Quix, where he leads data strategy and developer relations. He is focused on helping companies imagine and implement their strategic data vision with stream processing at the forefront. He was previously a Head of Data and Data Engineer at high growth startups and has spent his career leading T-shaped teams in developing analytics platforms and data-intensive AI applications.
In his spare time, Tun goes surfing, plays guitar and tends to his analogue cameras.
- Moving from Offline to Online Machine Learning with River
![](/media/avatars/Portrait_moi_0oEZUSD.jpg)
Senior Data Scientist @ Capgemini Invent
Leading the team behind MAPIE, an open-source library within the sklearn-contrib ecosystem, focused on conformal predictions.
After earning a MSc in Computer Science from École Centrale, I spent a few years in product management before returning to more technical roles.
Let’s connect!
- Boosting AI Reliability: Uncertainty Quantification with MAPIE
Vyoma Gajjar is an AI Technical Solution Architect with over a decade of experience in AI governance, generative AI, and machine learning. She has worked extensively on developing scalable AI solutions and governance frameworks for global industries, focusing on highly regulated sectors like finance and healthcare. Vyoma is passionate about ethical AI practices and responsible innovation, frequently speaking at major conferences and serving as a mentor to aspiring AI professionals. She holds a patent in AI and actively contributes to shaping the future of trustworthy AI technologies.
- LLMs in Regulated Industries: Challenges and Governance Solutions
- Python + BigQuery + DataFrames: Hands on with scalable "serverless" analysis, ML, and AI
![](/media/avatars/image_dYjlDjq.png)
Wes McKinney is an open source software developer and entrepreneur focusing on data processing tools and systems. He created the Python pandas and Ibis projects, and co-created Apache Arrow. He is a Member of the Apache Software Foundation and also a project PMC member for Apache Parquet. He is currently a Principal Architect at Posit PBC and a co-founder of Voltron Data.
- Retooling for a Smaller Data Era
![](/media/avatars/zain_hasan_headshot-1_hrjRgnX.jpg)
Zain Hasan is a Senior AI/ML DevRel Engineer at Together AI a company that allows people to train, fine-tune, and run generative AI models faster, at lower cost, and at production scale. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto building artificially intelligent assistive technologies. He then founded his company developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients. More recently he practiced as a consultant senior data scientist in Toronto. He is passionate about open-source software, education, community, and machine learning and has delivered workshops and talks at multiple events and conferences.
- Breaking Free from Extraction Pipelines: ColPali’s Vision-Powered RAG for Enterprise Documents
Tony has a broad background in the automotive and tech industries , with a focus on deep learning, particularly within the natural language processing domain. He has worked extensively on training language models, orchestrating GenAI ecosystems, and developing tools for LLM evaluation, decision-making, and hallucination mitigation. His experience ranges from hardware testing with LiDARs on autonomous vehicles to object recognition using computer vision and generative AI applications.
- Enhancing Maternal Healthcare: Training Language Models to Identify Urgent Messages in Real-Time
![](/media/avatars/hba-headshot_nqUnShf.jpg)
Hugo Bowne-Anderson is an independent data and AI consultant with extensive experience in the tech industry. He is the host of the industry Vanishing Gradients, where he explores cutting-edge developments in data science and artificial intelligence.
As a data scientist, educator, evangelist, content marketer, and strategist, Hugo has worked with leading companies in the field. His past roles include Head of Developer Relations at Outerbounds, a company committed to building infrastructure for machine learning applications, and positions at Coiled and DataCamp, where he focused on scaling data science and online education respectively.
Hugo's teaching experience spans from institutions like Yale University and Cold Spring Harbor Laboratory to conferences such as SciPy, PyCon, and ODSC. He has also worked with organizations like Data Carpentry to promote data literacy.
His impact on data science education is significant, having developed over 30 courses on the DataCamp platform that have reached more than 3 million learners worldwide. Hugo also created and hosted the popular weekly data industry podcast DataFramed for two years.
Committed to democratizing data skills and access to data science tools, Hugo advocates for open source software both for individuals and enterprises.
- Building an AI Travel Agent That Never Hallucinates