-
Learning to Live with Errors: The Challenges and Solutions for Memory Reliability in the Sub-20nm Regime
Wed, Oct 12, 2016 @ 10:00 AM - 11:30 AM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars
Speaker: Prashant J. Nair, Ph.D. Candidate, Georgia Institute of Technology
Talk Title: Learning to Live with Errors: The Challenges and Solutions for Memory Reliability in the Sub-20nm Regime
Abstract: Technology scaling, the prime driver for high-capacity memory systems, continues to be critical for current applications and acts as a key enabler for future applications. Unfortunately, scaling DRAM below sub-20nm is already becoming a challenge due to small feature sizes and flaky cells. Designers are also investigating alternative technologies such as die-stacking and Non-Volatile Memories (NVM), which makes the memory system susceptible to new failure modes (such as TSV failures). At these high error rates and failure modes, memory reliability challenges pose a serious threat to scaling. Furthermore, the cost of mitigating these failures with traditional solutions becomes impractically high. The goal of my thesis is to investigate architectural techniques to enable reliable and scalable memory systems at negligible overheads. In this talk, I will discuss three low-cost techniques to mitigate memory failures.
First, I will advocate a cross-layer approach to tolerating memory failures, whereby the scaling faults are exposed to the architecture level and a simple error-correction code is used to tolerate scaling failures. Such a scheme can tolerate error rates as high as 100 parts per million with a negligible storage overhead. Second, I will discuss the challenges of TSV-failures for die-stacked memory systems and present techniques that can mitigate TSV and other large failures at runtime using a RAID-based technique. Finally, I will discuss a scheme called XED that can obtain Chipkill-level reliability by using 2x fewer chips for memory systems with On-Die ECC. XED mitigates the performance and power overheads of Chipkill without requiring any changes to the memory interface and transparently exposing the error detection information from each chip to the memory controller. Overall, this talk aims to showcase techniques that will enable dense, efficient and reliable memories that are robust to the pitfalls of technology scaling and die-stacking.
Biography: Prashant J. Nair is a Ph.D. candidate in Georgia Institute of Technology where he is advised by Professor Moinuddin Qureshi. He received his MS (2011-2013) from Georgia Institute of Technology and his BE in Electronics Engineering (2005-2009) from University of Mumbai. His research interests include reliability, performance, power and refresh optimizations for current and upcoming memory systems. In these areas, he has authored and co-authored 9 papers in top-tier venues such as ISCA, MICRO, HPCA, ASPLOS and DSN. Prashant organized the "Memory Reliability Forum" (co-located with HPCA 2016) to highlight the importance of memory reliability to the wider architecture community. He served as the Submission's Co-chair of MICRO 2015 and in the ERC of ISCA 2016. During his Ph.D., he interned at several leading industrial labs including AMD, Samsung, Intel and IBM.
Host: Murali Annavaram
Location: Hughes Aircraft Electrical Engineering Center (EEB) - 248
Audiences: Everyone Is Invited
Contact: Estela Lopez