Speaker
Details
In recent years, the landscape of artificial intelligence has witnessed remarkable development across various domains. Deep learning models continue to expand in size for better generalization ability and superior performance on downstream tasks. However, the ever-growing scale of these models presents challenges, particularly in compute-constrained environments.
This presentation aims to address the imperative of enhancing the efficiency of deep learning models, both in training and inference phases. Firstly, I will introduce Chain of LoRA (COLA), a novel iterative optimization framework inspired by the Frank-Wolfe algorithm. This framework employs a residual learning procedure and effectively bridges the gap in generalization error between full-parameter fine-tuning and parameter-efficient fine-tuning, without imposing additional memory burdens.
Secondly, I will delve into adaptive gradient method called SAMUEL to automate the learning rate schedule, thereby reducing the computational cost of hyperparameter selection for training recipes. Our proposed method is built upon the multiplicative weight framework. Empirically, we demonstrate the robustness of our method in automated selection of the optimal learning to form a learning rate schedule for both online and offline settings.
Adviser: Elad Hazan