Speaker
Details
Adversarial machine learning (ML) studies how to analyze and defend ML models in the presence of an adversary. Such adversaries can introduce perturbations into inputs via instance- and model-specific algorithms known as attacks in order to induce misclassification. This thesis reconsiders key assumptions in existing research, particularly the scope of attacks considered, and proposes new problem settings with relaxed assumptions for more practical defenses.
We begin by analyzing the fundamental limits of adversarial robustness against a known test- time adversary for multiclass classification. By deriving bounds on the 0-1 loss of the optimal defended classifier, we clarify how the attack and data geometry impact the difficulty of classification. Interestingly, we find that for a wide range of perturbation sizes—including values beyond those typically considered in defenses—the optimal loss is zero. This suggests that it may be reasonable to consider larger perturbation sizes when designing defenses, but how to choose this value is unclear.
A major limitation of defenses in prior work is that they often assume that the space of attacks is known to the defender a priori. However, in practice, it is difficult to formulate this exact space. Given this challenge, we redefine the defender’s goal to include generalization to unforeseen adversaries—perturbation types and sizes not considered during defense design. To improve robustness, we incorporate a theoretically motivated regularization term, variation regularization, into adversarial training. To rigorously evaluate defenses in this setting, we introduce MultiRobustBench, a standardized benchmark and leaderboard for assessing robustness against multiple attacks. Our results reveal that many existing defenses perform poorly in worst-case scenarios, underscoring the difficulty of achieving unforeseen robustness.
Rather than hoping to generalize to unforeseen attacks, a defender can aim to rapidly adapt their model upon discovering these attacks. We call this problem continual adaptive robustness and propose continual robust training (CRT), a fine-tuning-based defense that integrates new attacks over time. Our experiments demonstrate that regularization plays a crucial role in improving CRT’s effectiveness.
In summary, this thesis formulates new problem settings and develops defenses and benchmarks for robustness in the face of the evolving landscape of adversarial threats.
Adviser: Prateek Mittal
Zoom Mtg: https://princeton.zoom.us/j/99574677334?pwd=fxwEWO4hMWvFlPB1FIImpurfOvq…