Beyond Robustness Against Single Attack Types

Pre-FPO Presentation
Date
Apr 15, 2024, 1:00 pm2:30 pm
Location
See Abstract for Zoom Link

Speaker

Details

Event Description

Machine learning has many security critical applications such as autonomous driving, face authentication, spam and malware detection, and medical imaging. For these applications, it is important to ensure that ML systems behave as expected even under the presence of noise. However, ML models are easily fooled by small (often imperceptible) changes to the input. These perturbed inputs are known as adversarial examples. While many works have looked at designing defenses that improve adversarial robustness, these works focus only on a small subset of possible perturbations (primarily Lp bounded perturbations). However, in practice, we would like to be robust against any imperceptible perturbation. This motivates the study of a field called unforeseen robustness which looks at designing defenses that are able to generalize outside of the attacks that they explicitly defend against.

In this talk, I will provide a mathematical formulation for the problem of unforeseen robustness and then introduce an algorithm, adversarial training with variation regularization, for improving unforeseen robustness. I will then discuss the problem of benchmarking robustness against a wide variety of attacks and introduce new metrics and analyze the performance of existing defenses for this task. Finally, I will introduce a new problem setting called continual adaptive robustness, which looks at achieving robustness against a sequence of attacks. This setting models when the defense can be repeatedly patched every time a new attack is introduced. I will briefly describe my ongoing efforts on addressing this problem through regularized training and finetuning and conclude by describing further potential directions related to continual adaptive robustness.

Adviser: Prateek Mittal

Zoom Meeting: https://princeton.zoom.us/j/91870052866?pwd=TVpWQWZMckxTZ2xwUklCS2M1dGp4QT09