Efficient Large Vision Models

ECE Pre-FPO Presentation
Date
Feb 6, 2025, 10:00 am11:00 am
Location
EQUAD J323

Speaker

Details

Event Description

The rapid advancements in large vision models have significantly improved performance across various tasks, such as classification, segmentation, and generation. However, their efficiency remains a major bottleneck, limiting their deployment in resource-constrained environments. Enhancing the efficiency of these models is crucial to enable broader accessibility, reduce computational costs, and promote sustainability in AI research.

I will mainly introduce training-free methods designed for large vision models: Zero-TPrune and AT-EDM. Zero-TPrune applies token pruning to Vision Transformers by leveraging semantic importance and similarity. Semantic importance is determined using a graph-based Weighted PageRank (WPR) algorithm, while token similarity is calculated by grouping tokens based on their semantic importance. AT-EDM generalizes the Weighted PageRank algorithm to cross-attention and deploys it in Diffusion Models for text-to-image generation. This method prunes less-important tokens based on attention maps and strategically recovers them based on similarity. AT-EDM achieves up to 50% speed-up without requiring any training or fine-tuning and can be combined with sampling distillation for further efficiency gains. Additionally, I will provide a brief overview of LinGen, our latest work that achieves linear computational complexity for text-to-video generation, enabling high-resolution minute-length video generation on a single GPU at 15x less cost.

Adviser: Niraj Jha