Skip to yearly menu bar Skip to main content


Timing as an Action: Learning When to Observe and Act

Helen Zhou · Audrey Huang · Kamyar Azizzadenesheli · David Childers · Zachary Lipton

MR1 & MR2 - Number 46
[ ]
Fri 3 May 8 a.m. PDT — 8:30 a.m. PDT


In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.

Live content is unavailable. Log in and register to view live content