Model interpretability is a topic of renewed interest given today's widespread practical use of machine learning, and the need to trust or understand automated predictions. We consider the problem of optimally learning interpretable out-of-sample mappings for nonlinear embedding methods such as $t$-SNE. We argue for the use of sparse oblique decision trees because they strike a good tradeoff between accuracy and interpretability which can be controlled via a hyperparameter, thus allowing one to achieve a model with a desired explanatory complexity. The resulting optimization problem is difficult because decision trees are not differentiable. By using an equivalent formulation of the problem, we give an algorithm that can learn such a tree for any given nonlinear embedding objective. We illustrate experimentally how the resulting trees provide insights into the data beyond what a simple 2D visualization of the embedding does.