Topics in Low-rank Markov Decision Process: Applications in Policy Gradient, Model Estimation and Markov Games

May 12, 2023, 1:30 pm3:00 pm



Event Description

In this thesis, we study the topics on Markov Decision Processes (MDP) with a low-rank structure. We begin with the definition of a low-rank Markov Decision Process, and discuss the related applications in the followed chapters.

In Chapter 2, we consider the off-policy estimation problem of the policy gradi- ent. We propose an estimator based on Fitted Q Iteration which can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. We provide a tight finite-sample upper bound on the estimation error, given the MDP satisfies the low-rank assumption. Empirically, we evaluate the performance of the estimator on both policy gradient estimation and policy op- timization. Under various metrics, our results show that the estimator significantly outperforms existing off-policy PG estimation methods based on importance sam- pling and variance reduction techniques.

In Chapter 3 and Chapter 4, we study the estimation problem of low-rank MDP models. A tensor-based formulation is proposed to capture the low-rank informa- tion of the model. We develop a tensor-rank-constrained estimator that recovers the model from the collected data, and provide statistical guarantees on the estimation error. The tensor decomposition of the transition model provides useful informa- tion for the reduction of the state and action spaces. We further prove that the learned state/action abstractions provide accurate approximations to latent block structures if they exist, enabling function approximation in downstream tasks such as policy evaluation.

In Chapter 5, we study the representation learning problem of Markov Games, which is a natural extension of the MDPs to the multi-player setting. We present a model-based and a model-free approach to construct an effective representation from the collected data, which is further used to learn an equilibrium policy. A theoretical guarantee is provided, which shows the algorithm is able to find a near- optimal policy with polynomial interactions with the environment. To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates function approximation.


Adviser: Mengdi Wang