Skip to yearly menu bar Skip to main content


End-to-end Feature Selection Approach for Learning Skinny Trees

Shibal Ibrahim · Kayhan Behdin · Rahul Mazumder

MR1 & MR2 - Number 17
award Student Paper Highlight
[ ]
Fri 3 May 8 a.m. PDT — 8:30 a.m. PDT
Oral presentation: Oral: General Machine Learning
Fri 3 May 7 a.m. PDT — 8 a.m. PDT


Joint feature selection and tree ensemble learning is a challenging task. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importances, which are known to be misleading, and can significantly hurt performance. We propose Skinny Trees: a toolkit for feature selection in tree ensembles, such that feature selection and tree ensemble learning occurs simultaneously. It is based on anend-to-end optimization approach that considers feature selection in differentiable trees with Group L0 − L2 regularization. We optimize with a first-order proximal method and present convergence guarantees of our algorithmic approach for a non-convex and non-smooth objective. Interestingly, dense-to-sparse regularization scheduling can leadto more expressive and sparser tree ensembles than vanilla proximal method. On 15 synthetic and real-world datasets, Skinny Trees can achieve 1.5x− 620x feature compression rates, leading up to 10× faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for 25% feature budget, Skinny Trees outperforms LightGBM by 10.2% (up to 37.7%), and Random Forests by 3% (up to 12.5%).

Live content is unavailable. Log in and register to view live content