Deep neural networks (DNNs) have flourished a wide-range of artificial intelligence (AI) applications. The prevalent adoption of DNNs can be attributive to its high customizability for different tasks. In fact, researchers have designed variants of DNNs for different applications, e.g., convolutional neural networks (CNNs) for visual recognition, generative adversarial networks (GANs) for image synthesis, recurrent neural networks (RNNs) for time-series processing, etc. All these variants bear highly different network topologies and training objectives.
Despite the success of DNNs, there is a growing concern for the efficiency of DNNs. Current DNNs are resource-hungry, setting a hard barrier for them to deploy on resource-limited edge devices. However, the broadness of applications that DNNs are adopted to imminently increase the difficulty of discovering efficient DNNs design for different variants. Due to such crucial diversity, it is hard to yield a generic approach to attain efficient DNNs with satisfactory performance across different applications.
In this dissertation, we address the challenge for efficient design of DNNs in different domains, with a simple but intuitive and effective notion: DNNs themselves are customized for different learning objectives, so should the approaches to enhance their efficiency. With this notion, we present methodologies to design efficient CNNs, GANs, and RNNs. We first introduce a CNN compression algorithm, class-discriminative compression (CDC), which fits seamlessly with CNN’s class-discriminative training objective and provides a 1.8x acceleration for ResNet50 on ImageNet without accuracy loss. We then perform an in-depth study into channel pruning for CNN compression. Driven by the objective of classification accuracy, we propose an evolutionary framework to automatically discover transferable pruning functions that outperform manual designs. We further investigate a different application of image synthesis with GAN. We observe that GAN is trained to synthe- size realistic contents and thus pioneer a content-aware GAN compression method, which accelerate state-of-the-art models by 11x with negligible image quality loss. We finally expand our study to the domain of system designs where we aim to mitigate the memory wall by building efficient RNN data prefetcher. We develop an ML-architecture co-design strategy to speed up the state-of-the-art neural prefetchers by 15x with even better performance.