-
CS Colloquium: Xia (Ben) Hu (Rice University) - Efficient LLM Serving via Lossy Computation
Wed, Apr 09, 2025 @ 10:00 AM - 11:00 AM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Xia (Ben) Hu, Rice University
Talk Title: Efficient LLM Serving via Lossy Computation
Series: Computer Science Colloquium
Abstract: Large language models (LLMs) have exhibited human-like conversational abilities. Yet, scaling LLMs to longer contexts, such as extracting information from lengthy articles, one of the most fundamental tasks in healthcare applications, poses significant challenges. The primary issues are their inability to handle contexts beyond pre-training lengths and system constraints that make deployment difficult, as memory requirements for inference increase with context length. The key idea to overcome these challenges is that LLMs are extremely robust to noise from lossy computation, such as low-precision computation. Following this insight, we will discuss recent advancements in serving LLMs at scale, particularly in handling longer contexts. To address the algorithmic challenge, I will share our recent work on extending LLM context length to at least 8× longer by coarsening the positional information of distant tokens. To address the system challenge, I will discuss our recent efforts in quantizing the intermediate states of past tokens to 2-bit numbers, leading to a 8x memory efficiency and 3.5x wall-clock time speedup without harming performance. Finally, I will highlight our latest projects applying LLMs in healthcare, particularly how we utilize retrieval techniques for long contexts to mitigate the hallucination problem in healthcare chatbots. This lecture satisfies requirements for CSCI 591: Research Colloquium
Biography: Dr. Xia “Ben” Hu is an Associate Professor at Rice University in the Department of Computer Science. Dr. Hu has published over 200 papers in several major academic venues, including NeurIPS, ICLR, ICML, KDD, IJCAI, etc. An open-source package developed by his group, namely AutoKeras, has become the most used automated deep learning system on GitHub (with over 9,000 stars and 1,000 forks). Additionally, his work on LLM efficiency, deep collaborative filtering, anomaly detection, knowledge graphs, and fast interpretation has been incorporated into production systems at Hugging Face, TensorFlow, Apple, Bing, and Meta, respectively. His papers have received several Best Paper (Candidate) awards from venues such as ICML, WWW, WSDM, ICDM, AMIA, and INFORMS. He is the recipient of the NSF CAREER Award and the ACM SIGKDD Rising Star Award. His work has been cited more than 30,000 times with an h-index of 76. He served as General Co-Chair for WSDM 2020 and ICHI 2023, as well as Program Co-Chair for AIHC 2024 and CHASE 2025.
Host: Yan Liu
Location: Olin Hall of Engineering (OHE) - 132
Audiences: Everyone (USC) is invited
Contact: CS Faculty Affairs
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.