Abstract:
We study probabilistic prediction games when the underlying model is
misspecified, investigating the consequences of predicting using an
incorrect parametric model. We show that for a broad class of loss
functions and parametric families of distributions, the regret of playing
a ``proper'' predictor---one from the putative model class---relative to
the best predictor in the same model class has lower bound scaling at
least as $\sqrt{\gamma n}$, where $\gamma$ is a measure of the model
misspecification to the true distribution in terms of total variation
distance. In contrast, using an aggregation-based (improper) learner, one
can obtain regret $d \log n$ for any underlying generating distribution,
where $d$ is the dimension of the parameter; we exhibit instances in which
this is unimprovable even over the family of all learners that may play
distributions in the convex hull of the parametric family. These results
suggest that simple strategies for aggregating multiple learners together
should be more robust, and several experiments conform to this hypothesis.