Speaker
Details
As the capabilities of large-scale machine learning models expand, so too do their associated risks. There is an increasing push for policies that mandate these models to be safe, preserve privacy, and maintain transparency regarding data usage. However, a significant challenge lies in translating the qualitative mandates into quantitative, auditable, and actionable criteria. In this talk, I will guide you through my journey aimed at adopting a quantitative and auditable framework to identify and mitigate these risks effectively.
The talk begins by examining methods to quantify and assess the fragility of safety alignments in Large Language Models (LLMs). Following this, we will explore strategies for auditing compliance with data transparency regulations. I will also briefly share my exploration of privacy leakage and mitigation strategies in distributed training. Finally, I will project into the future and highlight several emerging risks that I am passionate about tackling, including a discussion on the technical identification of these risks and a roadmap of collaboration with policy researchers and policymakers.
Advisers: Kai Li, Sanjeev Arora