Many contemporary machine learning models require extensive tuning of hyperparameters to perform well. A variety of methods, such as Bayesian optimization, have been developed to automate and expedite this process. However, tuning remains extremely costly as it typically requires repeatedly fully training models. To address this issue, Bayesian optimization methods have been extended to use cheap, partially trained models to extrapolate to expensive complete models. While this approach enlarges the set of explored hyperparameters, including many low-fidelity observations adds to the intrinsic randomness of the procedure and makes extrapolation challenging. We propose to accelerate hyperparameter tuning for neural networks in a robust way by taking into account the relative amount of information contributed by each training example. To do so, we integrate importance sampling with Bayesian optimization, which significantly increases the quality of the black-box function evaluations and their runtime. To overcome the additional overhead cost of using importance sampling, we cast hyperparameter search as a multi-task Bayesian optimization problem over both hyperparameters and importance sampling design, which achieves the best of both worlds. Through learning a trade-off between training complexity and quality, our method improves upon validation error, in the average and worst-case. We show that this results in more reliable performance of our method in less wall-clock time across a variety of and datasets complex neural architectures.