Compiler Support for Deep Learning Accelerators: End-to-End Evaluation and Data Access Optimization

Date
Oct 25, 2024, 4:00 pm5:30 pm
Location
EQUAD B418

Speaker

Details

Event Description

Specialized hardware accelerators have been developed to enhance power-performance efficiency for Deep Neural Network (DNN) applications. A primary challenge in DNN accelerator development is the early-stage evaluation of design prototypes on real-world applications. Such evaluations are crucial: modern DNN accelerators are equipped with several techniques to boost power-performance, but these techniques can introduce numerical discrepancies, such as data quantization with customized numerical representation or reformulated operators. Given the deeply-connected layered nature of DNN applications, these numerical errors can accumulate and result in significant deviations from reference results. Additionally, the energy and performance costs of data movement between host machine and the accelerator’s on-chip memory are substantial, making the reduction of the data transfer a critical optimization focus for mapping DNN applications to accelerators.

To address these challenges, this thesis proposes several innovative solutions. First, we introduce “3LA” – an end-to-end compiler pipeline that facilitates application-level testing of hardware accelerator prototypes on unmodified DNN applications. Built upon a recently proposed formal hardware specification named Instruction-Level Abstraction (ILA), 3LA allows for automated application-level simulation, providing crucial development feedback with much reduced manual engineering effort.

Second, we proposed Shoehorn, an optimized scheduler designed for mapping DNN operators to hardware accelerators that co-optimizes loop tiling, loop ordering and on-chip memory partitioning decisions. This scheduler creates an optimal mapping schedule for single application-level operators to a specific accelerator, minimizing off-chip memory access.

Lastly, this thesis introduces “COSMA,” an optimization framework that aims to minimize total off-chip data access when deploying entire or segments of DNN applications to the target accelerator. COSMA collectively optimizes operator scheduling, memory allocation and tensor replacement strategies, presenting a comprehensive solution to data movement minimization.

These contributions are expected to significantly streamline the process of DNN accelerator development, from early-stage design to final application deployment, enhancing both efficiency and effectiveness in the field.

Adviser: Sharad Malik