Speaker
Details
Reward-guided generation enhances generative models by enabling them to produce samples with desired characteristics defined by a reward function. This approach allows for better customization and personalization of generative AI and unlocks new possibilities for generative models in fields such as reinforcement learning, biological design, and optimization. Studying the methodologies and theoretical foundations behind reward-guided generation is essential to fully realize its potential.
This talk will cover my major studies on advancing reward-guided generation, focusing on predominant generative models: diffusion models and large language models (LLMs). For diffusion models, I will present my research on conditional diffusion models and guidance methods, focusing on the principal methodologies and their theoretical guarantees on sample reward improvement. An application of guided diffusion models for solving combinatorial optimization problems will also be discussed. For LLMs, I will talk about my research on Reinforcement Learning from Human Feedback (RLHF), the mainstream reward-guided technique in LLMs, and share my findings on a common pitfall of margin-based RLHF.
The talk will conclude by discussing some remaining research challenges in reward-guided generation. Among these challenges, I will propose a promising research direction to study reward guidance in discrete diffusion and flow models.
Adviser: Mengdi Wang