Learn to Solve Constrained Markov Decision Process Efficiently
November 6 @ 14:00 - November 7 @ 15:00
Abstract: Many constrained sequential decision-making processes, such as safe AV navigation, wireless network control, caching, cloud computing, etc., can be cast as Constrained Markov Decision Processes (CMDP). Reinforcement Learning (RL) algorithms have been used to learn optimal policies for unknown unconstrained MDP. Extending these RL algorithms to unknown CMDP brings the additional challenge of maximizing the reward and satisfying the constraints. In this talk, I will present algorithms that can learn safe policies effectively. In the second part of the talk, I will demonstrate how the theoretical understanding of the constrained MDP can help us to develop algorithms for practical applications. As an application, I show how to learn to obtain optimal beam directions under time-varying interference-constrained channels for a mobile service robot. Optimal beam selection in mmWave is challenging because of its time-varying nature. We propose a primal-dual Gaussian process bandit with adaptive reinitialization to handle non-stationarity and interference constraints. We demonstrate how our approach learns to adapt effectively to time-varying channel conditions. Co-sponsored by: IEEE North Jersey Section Speaker(s): Dr. Arnob Ghosh, Agenda: Nov 6th, Talk: 7:00 PM – 8:00 PM Discussion Q/A: 8:00 PM – 8:15 PM Room: ECE 202, Bldg: Electrical and Computer Engineering, 154 Summit Street, Newark, New Jersey, United States, 07102