Machine learning (ML) has been widely deployed in real-world applications such as face authentication, autonomous driving, and augmented reality. The functionality of these ML systems heavily relies on their ability to perceive their physical surroundings. Unfortunately, there exist “physical-world” adversarial attacks against ML perception models. An attacker can attach a maliciously generated “patch” to a physical scene; then, images taken from that scene could contain malicious content and induce models’ incorrect predictions. Such attacks impose an urgent threat to ML systems that interact with the physical world and thus have motivated the design of robust perception models.
However, it has been notoriously hard to formally claim security and robustness: a lot of defenses have been broken by smarter attackers within only a few months of release. To overcome this challenge, this talk focuses on the concept of provable/certifiable robustness. The objective is to design defenses in a special way such that we can prove/certify the prediction correctness against any future attack within a certain threat model. This robustness claim even holds for attackers with perfect knowledge of the defense system.
In this talk, I will first introduce the concepts of adversarial patch attacks and certifiable robustness. Then, I will discuss PatchGuard, an extensible framework for developing certifiably robust image classification algorithms. Next, I will introduce PatchCleanser, a different image classification technique that achieves state-of-the-art defense performance. Finally, I will briefly discuss defense strategies (DetectorGuard and ObjectSeeker) for the more challenging object detection task and conclude with my ongoing research efforts.
Adviser: Prateek Mittal