Wed, Mar 02, 2022 @ 10:00 AM - 11:00 AM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars
Speaker: Dr. Amir Gholami, Research Scientist, RiseLab and BAIR at UC Berkeley
Talk Title: Full Stack Deep Learning at the Edge
Abstract: An important next milestone in machine learning is to bring intelligence to the edge without relying on the computational power of the cloud. This could lead to more reliable, lower latency, and privacy preserving AI for a wide range of applications. However, state-of-the-art NN models require prohibitive amounts of compute, memory, and energy resources which is often not available at the edge. Addressing these challenges without compromising on accuracy, requires a multi-faceted approach, including hardware-aware model compression and accelerator co-design.
In this talk, I will first discuss a novel hardware-aware method for neural network quantization and pruning that achieves optimal trade-off between accuracy, latency, and model size. In particular, I will discuss a new Hessian Aware Quantization (HAWQ) method that relies on second-order information to perform low precision quantization of the model with minimal generalization loss. I will present extensive testing of the method on different learning tasks including various models for image classification, object detection, natural language processing, and speech recognition showing that HAWQ exceeds previous baselines. I will then present a recent extension of this method which allows integer-only inference for the end-to-end computations, enabling efficient deployment on fixed-point hardware. Finally, I will discuss a full-stack hardware-aware neural network architecture and accelerator design, which enables adapting the model architecture and the accelerator parameters to achieve optimal performance.
ICML'21: HAWQ-V3: Dyadic Neural Network Quantization
ICML'21: I-BERT: Integer-only BERT Quantization
Biography: Amir Gholami is a research scientist in RiseLab and BAIR at UC Berkeley. He received his PhD from UT Austin, working on large scale 3D image segmentation, a research topic which received UT Austin's best doctoral dissertation award in 2018. He is a Melosh Medal finalist, the recipient of best student paper award in SC'17, Gold Medal in the ACM Student Research Competition, best student paper finalist in SC'14, as well as Amazon Machine Learning Research Award in 2020. He was also part of the Nvidia team that for the first time made low precision neural network training possible (FP16), enabling more than 10x increase in compute power through tensor cores. That technology has been widely adopted in GPUs today. Amir's current research focuses on efficient AI, AutoML, and scalable training of Neural Network models.
Host: Host: Dr. Massoud Pedram, firstname.lastname@example.org
Audiences: Everyone Is Invited
Contact: Mayumi Thrasher