Putting AI on a Diet: TinyML and Efficient Deep Learning

Wed, May 19, 2021, 10:30 am to 11:30 am
Please register
Electrical and Computer Engineering
Computer Science

Please register here

Talk Recording

Abstract: Today’s AI is too big. Deep neural networks demand extraordinary levels of data and computation, and therefore power, for training and inference. This severely limits the practical deployment of AI in edge devices. We aim to improve the efficiency of neural network design. First, I’ll present MCUNet [1] that brings deep learning to IoT devices. MCUNet is a framework that jointly designs the efficient neural architecture (TinyNAS) and the light-weight inference engine (TinyEngine), enabling ImageNet-scale inference on micro-controllers that have only 1MB of Flash. Next I will introduce Once-for-All Network [2], an efficient neural architecture search approach, that can elastically grow and shrink the model capacity according to the target hardware resource and latency constraints. From inference to training, I’ll present TinyTL [3] that enables tiny transfer learning on-device, reducing the memory footprint by 7-13x. Finally, I will describe data-efficient GAN training techniques [4] that can generate photo-realistic images using only 100 images, which used to require tens of thousands of images. We hope such TinyML techniques can make AI greener, faster, more efficient and more sustainable.

[1] MCUNet: Tiny Deep Learning on IoT Devices, (NeurIPS’20 spotlight)

[2] Once-for-All: Train One Network and Specialize it for Efficient Deployment (ICLR’19)

[3] Tiny Transfer Learning: Reduce Memory, not Parameters for Efficient On-Device Learning (NeurIPS’20)

[4] Differentiable Augmentation for Data-Efficient GAN Training (NeurIPS’20)

Bio: Song Han is an assistant professor at MIT’s EECS. He received his PhD degree from Stanford University. His research focuses on efficient deep learning computing. He proposed “deep compression” technique that can reduce neural network size by an order of magnitude without losing accuracy, and the hardware implementation “efficient inference engine” that first exploited pruning and weight sparsity in deep learning accelerators. His team’s work on hardware-aware neural architecture search that bring deep learning to IoT devices was highlighted by MIT News, Wired, Qualcomm News, VentureBeat, IEEE Spectrum, integrated in PyTorch and AutoGluon, and received many low-power computer vision contest awards in flagship AI conferences (CVPR’19, ICCV’19 and NeurIPS’19). Song received Best Paper awards at ICLR’16 and FPGA’17, Amazon Machine Learning Research Award, SONY Faculty Award, Facebook Faculty Award, NVIDIA Academic Partnership Award. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning” and the IEEE “AIs 10 to Watch: The Future of AI” award.