Speaker
Details
Artificial intelligence (AI) has advanced rapidly, leading to remarkable progress across numerous real-world applications. However, the prevalence of AI-enabled decisions also raises concerns about its potential safety risks, as AI systems are known to exhibit failure cases across multiple domains, including autonomous driving, medical diagnostics, and content moderation. This thesis addresses multiple AI safety challenges with a specific focus on generative models, a class of machine learning systems that can learn to approximate the underlying distribution of training datasets and subsequently synthesize novel samples.
First, we focus on improving generalization in adversarially robust learning with generative models by incorporating them into existing machine learning pipelines and distilling their knowledge by synthesizing novel synthetic images. We assess various generative models and propose a new metric (ARC), based on the indistinguishability of adversarially perturbed synthetic and real data, to accurately determine the generalization benefit of different generative models. Next, we investigate task-aware knowledge distillation from generative models, where we first demonstrate the disparate contributions of individual synthetic images in improving generalization. To adaptively sample images with the highest generalization benefit, we propose an adaptive sampling technique that guides the sampling process in diffusion models to maximize the generalization benefit of generated synthetic images.
Next, we address shortcomings of long-tailed data distributions, which underlie numerous challenges in AI safety, by using generative models to generate high-fidelity samples from low-density regions. We propose a novel low-density sampling process for diffusion models, guiding the process towards low-density regions while maintaining fidelity, and rigorously demonstrate that our process successfully generates novel high-fidelity samples from low-density regions.
Finally, we demonstrate some of the limitations of existing generative models. We first consider the outlier detection task and demonstrate the shortcomings of modern generative models in solving it. In light of these findings, we propose SSD, an unsupervised framework for outlier detection based on unlabeled in-distribution data. We further uncover that modern diffusion models, which are used by millions of users, leak the privacy of training data, where we extract a non-trivial number of training images from the pre-trained diffusion models.
In summary, this thesis addresses multiple AI safety challenges and provides a comprehensive framework for the safety and reliability of AI systems under the new generative AI paradigm.
Advisers: Prateek Mittal and Mung Chiang
Zoom link: https://princeton.zoom.us/j/6092166036