Effectively learning from data and generating data in differentially private machine learning

Date
Aug 29, 2024, 3:00 pm4:30 pm
Location
EQUAD B327 & Zoom see abstract

Speaker

Details

Event Description

Machine learning models are susceptible to a range of attacks that exploit data leakage from trained models. Differential Privacy (DP) is the gold standard for quantifying privacy risks and providing provable guarantees against attacks. However, training machine learning models with differential privacy often incurs a significant utility drop.

In this dissertation, we investigate how to effectively learn from data and generate data in differentially private machine learning. To effectively learn from data in a privacy-preserving way, it is important to identify what kind of prior information we can leverage. Firstly, we study the label-DP set-up, where the feature is public and the label is private. We investigate how to improve the model utility under label-DP by leveraging public feature to add less noise and reduce the effect of noise. Secondly, we study how to leverage the synthetic images to improve differentially private image classification. While such synthetic images are generated without access to real-world images and are only marginally helpful in non-private training, we find that these synthetic images can provide a better prior for differentially private image classification. We further study how to maximize the use of such synthetic priors to further unlock their full potential to improve private training. Thirdly, we study the privatization of zeroth-order optimization and propose DP-ZO. Our key insight is that the only information from data is a scalar while achieving competitive performance as SGD in fine-tuning large language models. There we only need to privatize such scalar. This is privacy friendly as we only need to add noise to a scalar instead of high-dimension gradients. Fourthly, for deferentially private synthetic data generation, we study privately generating the data with API access only to large language models without fine-tuning. Our proposed method can provide the privacy-protection for in-context learning in large language model with unlimited queries.

Adviser: Prateek Mittal

Join Zoom Meeting
https://princeton.zoom.us/j/91917430026?pwd=Hbuh3VVnk9EVcjaFZmRP9iRdcqk3Rl.1