Skip to yearly menu bar Skip to main content


Oral: General Machine Learning

Auditorium 1
Fri 3 May 7 a.m. PDT — 8 a.m. PDT
Chat is not available.

End-to-end Feature Selection Approach for Learning Skinny Trees

Shibal Ibrahim · Kayhan Behdin · Rahul Mazumder

We propose a new optimization-based approach for feature selection in tree ensembles, an important problem in statistics and machine learning. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importance scores, while very popular, they are known to havedrawbacks. We propose Skinny Trees: an end-to-end toolkit for feature selection in tree ensembles where we train a tree ensemble while controlling the number of selected features. Our optimization-based approach learns an ensemble of differentiable trees, andsimultaneously performs feature selection using a grouped $\ell_0$-regularizer. We use first-order methods for optimization and presentconvergence guarantees for our approach. We use a dense-to-sparse regularization scheduling scheme that can lead to more expressiveand sparser tree ensembles. On 15 synthetic and real-world datasets, Skinny Trees can achieve $1.5{\times}$--$620{\times}$ feature compression rates, leading up to $10{\times}$ faster inference over dense trees, without any loss in performance. Skinny Trees lead to superior feature selection than many existing toolkits e.g., in terms of AUC performance for 25\% feature budget, Skinny Trees outperforms LightGBM by 10.2\% (up to 37.7\%), and Random Forests by 3\% (up to 12.5\%).

Probabilistic Modeling for Sequences of Sets in Continuous-Time

Yuxin Chang · Alex Boyd · Padhraic Smyth

Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a mark'')---but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such asthe probability of item A being observed before item B,'' conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction.

Learning to Defer to a Population: A Meta-Learning Approach

Dharmesh Tailor · Aditya Patra · Rajeev Verma · Putra Manggala · Eric Nalisnick

The learning to defer (L2D) framework allows autonomous systems to be safe and robust by allocating difficult decisions to a human expert. All existing work on L2D assumes that each expert is well-identified, and if any expert were to change, the system should be re-trained. In this work, we alleviate this constraint, formulating an L2D system that can cope with never-before-seen experts at test-time. We accomplish this by using meta-learning, considering both optimization- and model-based variants. Given a small context set to characterize the currently available expert, our framework can quickly adapt its deferral policy. For the model-based approach, we employ an attention mechanism that is able to look for points in the context set that are similar to a given test point, leading to an even more precise assessment of the expert's abilities. In the experiments, we validate our methods on image recognition, traffic sign detection, and skin lesion diagnosis benchmarks.

An Impossibility Theorem for Node Embedding

T. Mitchell Roddenberry · Yu Zhu · Santiago Segarra

With the increasing popularity of graph-based methods for dimensionality reduction and representation learning, node embedding functions have become important objects of study in the literature. In this paper, we take an axiomatic approach to understanding node embedding methods. Motivated by desirable properties of node embeddings for encoding the role of a node in the structure of a network, we first state three properties for embedding dissimilarity networks. We then prove that no node embedding method can satisfy all three properties at once, reflecting fundamental difficulties inherent to the task. Having identified these difficulties, we show that mild relaxations of these axioms allow for certain node embedding methods to be admissible.