PyData Global 2024

Egor Kraev

Dr. Egor Kraev has been applying machine learning to real-world problems since last century, including economic and human development data analysis for nonprofits in the US, the UK, and Ghana, and 10 years as a quant, solutions architect, and occasional trader at UBS then Deutsche Bank.
Following last decade's explosion in AI techniques, Egor became Head of AI at Mosaic Smart Data Ltd, and for the last four years is bringing the power of AI to bear at Wise, in a variety of domains, from fraud detection to trading algorithms and causal inference for A/B testing and marketing, and now in multiple GenAI projects across the company.
In addition to having taken the Data Science team at Wise from an idea to a well-structured team of over 30 people, Egor is the founder of a startup, motleycrew.ai, aiming to take multi-agent AI systems to the next level of usability and power.

The speaker's profile picture

Sessions

12-04
14:30
30min
Fast, intuitive feature selection via regression on Shapley values
Egor Kraev, Baran Koseoglu

Feature selection is an essential process in machine learning, especially when dealing with high-dimensional datasets. It helps reduce the complexity of machine learning models, improve performance, mitigate overfitting, and decrease computation time. This talk will present a novel open source feature selection framework, shap-select.
Shap-select is noteworthy because of its simplicity - it requires only one fit of the model for which one does feature selection, and yet performs comparably to much heavier methods. It conducts a linear or logistic regression of the target on the Shapley values of the features, on the validation set, and uses the signs and significance levels of the regression coefficients to implement an efficient heuristic for feature selection in tabular regression and classification tasks.
We compare this to several other methods, showing that shap-select combines interpretability, computational efficiency, and performance, offering a robust solution for feature selection, especially for real-world cases where model fitting is computationally expensive.

AI/ML Track
AI/ML Track