-
ECE Seminar: A plug-and-play acceleration framework for generative AI models on the edge
Tue, Oct 22, 2024 @ 10:00 AM - 11:00 AM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars
Speaker: Dr. Yanzhi Wang, Associate Professor and Faculty Fellow, Dept. of ECE, Northeastern University
Talk Title: A plug-and-play acceleration framework for generative AI models on the edge
Abstract: In the generative AI era, general users need to apply different base models, fine tuned checkpoints, and LoRAs. Also the data privacy and real-time requirement will favor on-device, local deployment of large-scale generative AI models. It is desirable to develop a "plug-and-play" framework such that users can download any generative AI model, click and run on their own device. This poses significant challenge to the current AI deployment frameworks, which are typically time-consuming and requires human expertise of hardware and code generation. We present our effort of OminiX, which is a first step towards unified library and acceleration of generative AI models across various hardware platforms. Integrating our unique front-end library and back-end instantaneous acceleration techniques, which will be open-source soon, we show capability of plug-and-play deployment and state-of-the-art acceleration of various generative AI models, starting from image generation, large language models, multi-model language models, speech generation and voice cloning, real-time chatting engine, real-time translation, video generation, real-time avatar, to name a few. This can be achieved on everyone's own platform.
Biography: Yanzhi Wang is Associate Professor in the Department of Electrical and Computer Engineering at Northeastern University, a senior member of IEEE. His research interests focus on real-time and energy-efficient deep learning and artificial intelligence systems, especially on efficient large language models and large-scale generative AI systems. His research works have been published broadly in (i) machine learning conferences such as AAAI, CVPR, NeurIPS, ICML, ICCV, ICLR, IJCAI, ECCV, KDD, ICRA, ACM MM, ICDM, etc., (ii) architecture and system conferences such as ASPLOS, ISCA, MICRO, HPCA, CCS, VLDB, PLDI, WWW, ICS, PACT, CGO, IPDPS, INFOCOM, ICDCS, DAC, ICCAD, FPGA, FCCM, ISSCC, CICC, RTAS, RTSS, etc., and (iii) IEEE and ACM transactions. His research works have been cited over 20,500 times. He has received six Best Paper Awards and another 12 Best Paper Nominations. He has received the U.S. Army Research Office Young Investigator Program Award (YIP), IEEE TC-SDM Early Career Award, Asia Pacific Signal and Information Processing Association Distinguished Leader Award, Massachusetts Acorn Innovation Award, design contest awards from multiple conferences, and other research awards from Google, MathWorks, etc. His research work has been reported and cited by around 500 media. He has 13 academic descendants as tenure-track faculty members at University of Minnesota, Michigan State University, University of Georgia, Clemson University, etc.
Host: Dr. Sandeep Gupta, sandeep@usc.edu
Webcast: https://usc.zoom.us/j/98817797740?pwd=OfzLgQ5S1Gbb7b7mxxXe9FgST9u99L.1 (USC NetID Login Required)Location: Hughes Aircraft Electrical Engineering Center (EEB) - 248
WebCast Link: https://usc.zoom.us/j/98817797740?pwd=OfzLgQ5S1Gbb7b7mxxXe9FgST9u99L.1 (USC NetID Login Required)
Audiences: Everyone Is Invited
Contact: Mayumi Thrasher