Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where labels are only available at the bag level but missing for instances due to a high labeling cost. As a powerful Bayesian non-parametric model, Gaussian Processes (GP) have been extended from classical supervised learning to MIL settings, aiming to identify the most likely positive (or least negative) instance from a positive (or negative) bag using only the bag-level labels. However, solely focusing on a single instance in a bag makes the model less robust to outliers or multi-modal scenarios, where a single bag contains a diverse set of positive instances. We propose a general GP mixture framework that simultaneously considers multiple instances through a latent mixture model. By adding a top-k constraint, the framework is equivalent to choosing the top-k most positive instances, making it more robust to outliers and multimodal scenarios. We further introduce a Distributionally Robust Optimization (DRO) constraint that removes the limitation of specifying a fix k value. To ensure the prediction power over high-dimensional data (e.g., videos and images) that are common in MIL, we augment the GP kernel with fixed basis functions by using a deep neural network to learn adaptive basis functions so that the covariance structure of high-dimensional data can be accurately captured. Experiments are conducted on highly challenging real-world video anomaly detection tasks to demonstrate the effectiveness of the proposed model.