Principles and Applications of Discrete Deep Generative Models

May 11, 2023, 10:00 am11:30 am



Event Description

This thesis studies the principles and applications of discrete DGMs. Deep generative models (DGM) are deep neural networks capable of high-dimensional probability distribution modeling and random sample generation. Among the various applications of DGM, some involve inherently discrete components, which drives the need to model discrete random variables; for example, text modeling and control with discrete variables. The discreteness raises fundamental questions about the design of a discrete DGM. How to train a discrete DGM? What are the applications? How to perform large-scale discrete modeling and predictions?

We study the training of discrete DGM from the perspective of reparameterization. Reparameterization is a gradient estimation method for a random variable modeled by DGM. It is challenging due to the high variance of the gradient estimation. Inspired by the essential properties of Straight-Through Gumbel-Softmax estimators, we propose a new reparameterization method called the Gapped Straight-Through estimator to reduce variance without incurring resampling overhead.

We also present an application of discrete reparameterization in Reinforcement Learning (RL) for power system control where the control variables are integers. We contribute to this application in two aspects: an RL environment for power systems and an RL algorithm with an integer reparameterization scheme. The environment construction identifies the practical choices of the system. An open-source package for this environment has been released and used in the power research community. The RL algorithm for power systems includes a DDPG-style policy gradient and a reparameterization for integer actions.

Lastly, we explore large-scale generative text modeling from a kernelized perspective of Transformers. We observe that relative positional embedding (RPE) has been essential for Transformers to perform well on long sequences. However, a theoretical framework for RPE is still lacking. Thus, we formulate a kernelized version of RPE through Conditionally Positive Definite (CPD) kernels. The diversity of CPD kernels allows us to derive various RPE that enable length extrapolation (train short, test long). Experiments demonstrate that the logarithmic variant achieves excellent extrapolation on three large language modeling datasets.


Adviser: Peter Ramadge