When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Feng Zhu, Zeyu Zheng


We consider a single-product dynamic pricing problem under a specific non-stationary setting, where the demand grows over time in expectation and possibly gets noisier. The decision maker dynamically sets price and learns the unknown price elasticity, with the goal of maximizing expected cumulative revenue. We prove matching upper and lower bounds on regret and provide near-optimal pricing policies. We show how the change in demand uncertainty over time affects the optimal policy design and demonstrate how the order of optimal regret depends on the magnitude of demand uncertainty evolvement. Moreover, we distinguish between the \textit{any-time} situation and the \textit{fixed-time} situation by whether the seller knows the total number of time periods $T$ in advance or not, and show that they surprisingly render different optimal regret orders. We then extend the demand model to a more general case allowing for an additional intercept term and present a novel and near-optimal algorithm for the extended model. Finally, we consider an analogous non-stationary setting in the canonical multi-armed bandit problem, and points out that the \textit{any-time} situation and the \textit{fixed-time} situation render the same optimal regret order in a simple form, in contrast to the dynamic pricing problem.