Efficient inference of large models

ECE PRE FPO PRESENTATION
Date
Oct 30, 2024, 11:00 am12:00 pm
Location
Zoom Mtg: see abstract

Speaker

Details

Event Description

Recent breakthroughs in AI have been predominantly driven by scaling neural networks to unprecedented sizes. However, as these models grow larger, efficient inference becomes crucial for democratizing AI access. This talk explores systematic approaches to co-designing the model inference stack for enhanced efficiency. I will present our recent advances in three key areas: (1) accelerating language model inference through multi-head decoding architectures, (2) optimizing vision model performance via parallel computation and low-precision quantization techniques, and (3) developing coordinated multi-model inference systems.

Advisers: Jason lee and Kai Li

Zoom Meeting: https://princeton.zoom.us/my/tianle