AISTATS Poster Online Learning for Non-monotone DR-Submodular Maximization: From Full Information to Bandit Feedback

Poster

Online Learning for Non-monotone DR-Submodular Maximization: From Full Information to Bandit Feedback

Qixin Zhang · Zengde Deng · Zaiyi Chen · Kuangqi Zhou · Haoyuan Hu · Yu Yang

Auditorium 1 Foyer 120

[ Abstract ]

Abstract: In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the \textbf{Meta-MFW} algorithm achieving a

$1/e$ -regret of

$O(\sqrt{T})$ at the cost of

$T^{3/2}$ stochastic gradient evaluations per round. As far as we know, \textbf{Meta-MFW} is the first algorithm to obtain

$1/e$ -regret of

$O(\sqrt{T})$ for the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set. Furthermore, in sharp contrast with \textbf{ODC} algorithm~\citep{thang2021online}, \textbf{Meta-MFW} relies on the simple online linear oracle without discretization, lifting, or rounding operations. Considering the practical restrictions, we then propose the \textbf{Mono-MFW} algorithm, which reduces the per-function stochastic gradient evaluations from

$T^{3/2}$ to 1 and achieves a

$1/e$ -regret bound of

$O(T^{4/5})$ . Next, we extend \textbf{Mono-MFW} to the bandit setting and propose the \textbf{Bandit-MFW} algorithm which attains a

$1/e$ -regret bound of

$O(T^{8/9})$ . To the best of our knowledge, \textbf{Mono-MFW} and \textbf{Bandit-MFW} are the first sublinear-regret algorithms to explore the one-shot and bandit setting for online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, respectively. Finally, we conduct numerical experiments on both synthetic and real-world datasets to verify the effectiveness of our methods.

Live content is unavailable. Log in and register to view live content