Over the past decade, artificial intelligence (AI) has gained significant interest in industry and academia. Deep neural network (DNN) models have exploded in size over the years. Wider access to graphical processing units (GPUs) and ma- chine learning (ML) accelerators, along with increasing dataset sizes, have fueled this growth. However, efficient evaluation of such models on hardware in terms of accu- racy, peak power draw, energy consumption, and chip area of the integrated circuit remains challenging, requiring long design cycles and domain expertise. Increasing model sizes exacerbate this problem. In this thesis, we propose a set of frameworks targeting this challenge from various perspectives. We propose FlexiBERT, the first wide-scale design space for heterogeneous and flexible transformer architectures. We then propose AccelTran, a state-of-the-art transformer accelerator. Taking motivation from AccelTran, we propose ELECTOR, a design space of transformer accelerators, and implement transformer-accelerator co-design leveraging our co-design technique, namely BOSHCODE. We also propose EdgeTran, a co-search technique to find out the best-performing pair, i.e., the transformer model and the edge-AI device. We ap- ply this framework to convolutional neural networks (CNNs) as well (CODEBench). Finally, we discuss two extensions of BOSHCODE: DINI for data imputation and BREATHE for generic multi-objective optimization in vector and graphical search spaces. These works expand the scope of the proposed approach to a much more diverse set of applications.
Adviser: Niraj Jha