Learning to Bid in Discriminatory Auctions with Budget Constraints
Negin Golrezaei · Sourav Sahoo
Abstract
We study the bidding problem in multi-unit discriminatory (pay-as-bid) auctions from the perspective of a single bidder whose utility is given by value minus $\alpha$ times payment, where $\alpha \in [0,1]$ is the *cost-of-capital* parameter. The bidder aims to maximize cumulative utility over $T$ rounds subject to budget constraints. Even without budgets, the problem is nontrivial since the bidding space is exponential in $M$, the bidder’s maximum demand, and the valuation vector (treated as the context) varies over time. Leveraging the decomposition of the utility function across units, we develop polynomial-time learning algorithms based on directed acyclic graphs (DAGs) under both full-information and bandit feedback that achieve sublinear regret. Due to *complete cross learning over contexts*, where the utility under the observed context reveals counterfactual utilities, the regret bound in the bandit setting is independent of the number of contexts. With budgets, we design a primal-dual based algorithm: the DAG-based procedure performs the primal updates, while online gradient descent (OGD) is used for dual updates. This yields $\rho$-approximate sublinear regret, where $\rho \in (0,1]$ denotes the average per-unit, per-round budget.
Successful Page Load