Certifiable AI Security against Localized Corruption Attacks

Date
Dec 4, 2024, 2:00 pm3:30 pm
Location
EQUAD D321 & Zoom Mtg (see abstract)

Speaker

Details

Event Description

Building secure and robust AI models has proven to be difficult. Nearly all defenses, including those published at top-tier venues and recognized with prestigious awards, can be circumvented by adaptive attackers, who can adjust their attack strategies once they learn about the underlying defense algorithms. This dissertation studies one of the most challenging problems in AI security: How can we design defenses with formal robustness guarantees that remain effective against future adaptive attacks? We target the concept of certifiable robustness: it aims to certifiably establish a provable lower bound on model robustness against all possible attacks within a given threat model.

Specifically, we study the threat of localized corruption attacks: the attacker arbitrarily corrupts part of the input to induce inaccurate model predictions at the inference time. It is one of the most practical and common threats to AI models across a wide range of tasks, architectures, and data modalities. To mitigate localized attacks across different settings, we develop six certifiably robust algorithms, designed under two defense principles.

We structure this dissertation into three parts. The first part studies robust image classification and introduces three algorithms: PatchGuard, PatchCleanser, and PatchCURE. The second part researches robust object detection, presenting two algorithms DetectorGuard and ObjectSeeker. The final part examines text generation with large language models, detailing a robust retrieve-augmented generation algorithm named RobustRAG.

Notably, the algorithms presented in this dissertation scale effectively to large AI models like ViT and Llama, and are evaluated on large realistic datasets like ImageNet and Natural Questions. Several defenses achieve high certifiable robustness while maintaining benign model utility close to that of undefended models (e.g., 1% difference). These results represent one of the few notable advancements in AI security over the past few years, and we hope they inspire researchers to reflect on how we approach the challenges of securing AI systems.

Adviser: Prateek Mittal

Zoom Mtg.: https://princeton.zoom.us/j/9454515118