-
COMPUTER ENGINEERING SEMINAR SERIES
Thu, Apr 06, 2006 @ 02:00 PM - 03:20 PM
Ming Hsieh Department of Electrical and Computer Engineering
Conferences, Lectures, & Seminars
CENG SEMINAR SERIES"Optimized Compiler Generated Code Accelerators For FPGAs "Prof. Walid NajjarComputer Science & EngineeringUniversity of California, RiversideABSTRACT:Using FPGA devices to accelerate codes might have seemed an esoteric idea a few years ago. It is quickly moving into the mainstream not only for embedded but also supercomputer applications. Speedups ranging from 10x to 1000x have commonly been reported. FPGAs are commonly programmed using hardware description languages (HDL). HDLs are behavioral in nature and not easily amenable to high-level compiler transformations. In this paper we describe ROCCC (Riverside Optimizing Configurable Computing Compiler) a C to VHDL compiler that targets the automatic generation of FPGA-based accelerators. ROCCC optimizes and parallelizes the most frequently executed kernel loops in applications such as multimedia and scientific computing. Its objectives are to (1) bridge the performance gap between compiled and hand-written code and (2) apply extensive compile-time transformations on multi-dimensional arrays and non-trivial loop nests. Such transformations would be too complex for a human programmer to handle in a reasonable time. The objectives of the ROCCC optimizations are: (1) Maximize the parallelism in the circuit as well as the clock rate at which it operates. (2) Minimize the number of off-chip memory accesses as well as the area of the circuit. The main challenge that faces HLL to HDL translation is the paradigm shift from the stored program model to a value-based, data-driven execution from temporal to a spatial execution. The task of an FPGA compiler is to generate both the data path and the sequence of operations (control flow) on that data path. The lack of architectural structure on the FPGA presents a number of opportunities for the compiler: (1) The parallelism is very high and limited only by the size of the FPGA or the bandwidth in or out of it. (2) On-chip storage can be configured at will. (3) Circuit customization allows the compiler to reduce the circuit size as well as the clock duration. We use dynamic programming applications, for DNA and protein string matching, to demonstrate the potentials of ROCCC. A relatively small C code that is mapped to the FPGA available on the Cray XD1 can achieve 1 to 100 Giga cell update per second. This translates to a two to four orders of magnitude speedup compared to a 2 GHz CPU with an ideal cache and no pipeline stalls.BIO:Walid A. Najjar is a Professor in the Department of Computer Science and Engineering at the University of California Riverside. He received a B.E. in Electrical Engineering from the American University of Beirut in 1979 and the M.S. and Ph.D. in Computer Engineering from the University of Southern California in 1985 and 1988 respectively. He was on the faculty of the Department of Computer Science at Colorado State University (1989 to 2000), before that he was with the USC-Information Sciences Institute. His research is in computer architecture, reconfigurable and embedded systems and compiler optimizations and has been supported by NSF, DARPA and various companies. He has served on the program committees for a number of leading conferences in this area including CASES, ISSS-CODES, DATE, HPCA, and MICRO.Host: Prof. Viktor Prasanna, x04483
Location: Olin Hall of Engineering (OHE) - -136
Audiences: Everyone Is Invited
Contact: Rosine Sarafian