Processing math: 100%
Skip to yearly menu bar Skip to main content


Poster

Online Learning for Non-monotone DR-Submodular Maximization: From Full Information to Bandit Feedback

Qixin Zhang · Zengde Deng · Zaiyi Chen · Kuangqi Zhou · Haoyuan Hu · Yu Yang

Auditorium 1 Foyer 120

Abstract: In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the \textbf{Meta-MFW} algorithm achieving a 1/e-regret of O(T) at the cost of T3/2 stochastic gradient evaluations per round. As far as we know, \textbf{Meta-MFW} is the first algorithm to obtain 1/e-regret of O(T) for the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set. Furthermore, in sharp contrast with \textbf{ODC} algorithm~\citep{thang2021online}, \textbf{Meta-MFW} relies on the simple online linear oracle without discretization, lifting, or rounding operations. Considering the practical restrictions, we then propose the \textbf{Mono-MFW} algorithm, which reduces the per-function stochastic gradient evaluations from T3/2 to 1 and achieves a 1/e-regret bound of O(T4/5). Next, we extend \textbf{Mono-MFW} to the bandit setting and propose the \textbf{Bandit-MFW} algorithm which attains a 1/e-regret bound of O(T8/9). To the best of our knowledge, \textbf{Mono-MFW} and \textbf{Bandit-MFW} are the first sublinear-regret algorithms to explore the one-shot and bandit setting for online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, respectively. Finally, we conduct numerical experiments on both synthetic and real-world datasets to verify the effectiveness of our methods.

Live content is unavailable. Log in and register to view live content