-
The Case for Data-Driven Multithreading: Scaling the Memory Wall
Fri, Mar 09, 2007 @ 10:00 AM - 11:00 AM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars
CENG SEMINAR SERIES"The Case for Data-Driven Multithreading: Scaling the Memory Wall"Dr. Paraskevas (Skevos) EvripidouDepartment of Computer ScienceUniversity of CyprusAbstract:Over the last five decades computer designers were able to design and build faster and faster computers by relying on improvements on fabrication technologies and architectural/organization optimizations. However, over the last five years the most severe limitations of the sequential model, namely its inability to tolerate long latencies has slowed down the performance gains, forcing the industry to hit the Memory wall. As a result the entire industry had to switch to multiple cores per chip and thus move into the concurrency era. New concurrent models/paradigms are needed in order to fully utilize the potential of Multi-core chips. The Dataflow model is a formal model that can handle concurrency and it can tolerate memory and synchronization latency. Consequently, we propose Data-Driven Multithreading (DDM), as it does not suffer from the above-mentioned limitations. This is because DDM is not based on the von Neumman model of execution but instead on the data-flow model of execution, which is side effect free. Thus, the memory latencies can be tolerated without the huge performance penalties of the von Neumann model. Furthermore, the data-driven scheduling does not require the complexity of the multiple issue and out-of-order mechanisms. Data-Driven Multithreading is a non-blocking multithreading model based on the Decoupled Data-Driven model of execution. This model decouples the synchronization from the computation portions of a program allowing them to execute asynchronously. In this model a thread is scheduled for execution in a dataflow manner, i.e., whenever all of its required data have been produced. As a consequence, no synchronization or communication latencies are experienced. We have demonstrated that DDM can be implemented with regular off-the-shelf microprocessors. Therefore it has the obvious benefit that a system may combine both DDM and the latest microprocessor technology. The core of the DDM implementation is the Thread Synchronization Unit (TSU). TSU is a memory-mapped device attached directly to the processor's bus and provides data-driven thread scheduling to the conventional microprocessor. Data-Driven perfecting improves drastically the hit ratio of the cache and at the same time requires much smaller cache memories. Thus, limiting the power consumption, and reducing further the effect of long memory latencies. Simulation experiments have shown that DDM achieves very respectable speedups.Bio:Skevos Evripidou is a Professor at the department of Computer Science at the University of Cyprus. From 1990 to 1994 he was on the Faculty of the department of Computer Science and Engineering at the Southern Methodist University. He received his Phd in Computer engineering from the University of Southern California in 1990. His current research interests are in Parallel Processing, Computer Architecture, Pervasive and Mobile computing. Dr Evripidou has participated in several projects funded by the European Union, the USA (NSF, DARPA, and DOE) and the Cyprus Research Promotion Foundation. He is a member of the IFIP Working Group 10.3, the IEEE Computer Society and ACM SIGARCH. He is also a member of the Phi Kappa Phi and Tau Beta Pi honor societies. Host: Prof. Viktor Prasanna, prasanna@usc.edu
Location: Hughes Aircraft Electrical Engineering Center (EEB) - -248
Audiences: Everyone Is Invited
Contact: Rosine Sarafian