University of Southern California
The USC Andrew and Erna Viterbi School of Engineering
Prospective Students Current Students Alumni & Friends
About Us Academics Research News & Publications Giving
Events Calendar  |  Search  |  Contact  |  Site Map
Home  |  News & Publications  |  News  |  2006  |  Between Moore's Law and a Hard Place
Contact Us
News
2009
2008
2007
2006
2005
2004
2003
In the News
Events Calendar
Archives & Publications
 




 
Between Moore's Law and a Hard Place  
John Hennessy Says Chips Have "Come to the End of the ILP Road"

February 27, 2006 —

John Hennessy, with former student Timothy Pinkston in the background. "You just can't predict branching perfectly. If you could, the program wouldn't need a branch.

 Moore's Law has hit a slippery patch, Stanford University's charismatic electrical engineer, computer scientist and president told a full house at the fourth Viterbi centennial lecture Feb. 16. The denser and denser chips, packed with faster and faster transistors are coming on line as predicted — but it's becoming harder and harder to make efficient use of them for computations, as opposed to simple rote memory repositories.  The problem is fundamental, embedded in the way programmers structure computational tasks, and John Hennessy said the answer looks like it will have to come from the software side — and very likely from academia, not industry.

Timothy Pinkston, professor of electrical engineering, (PhD Stanford, '93) introduced his former professor, hitting the highlights of one of the most impressive resumes in American academia, one that in addition to leadership of Stanford includes gilt-edged research (establishing the parameters of now-standard RISC chips) entrepreneurship  (co-founder of MIPS Technology) honors (membership in the National Academy of Engineering, the National Academy of Sciences, the American Association for the Advancement of Science plus Von Neumann and Lamme Medals, and Mauchly Award and Cray Awards).

In an energetic and clear fashion, Hennessy began with a historical explanation of the problem as it has emerged and developed over the past 15 years to an audience heavily drawn from the Viterbi School computer science department, which sponsored the talk, with numerous researchers from the Information Sciences Institute in Marina del Rey making the pilgrimage to campus.

What the computer industry is looking at, Hennessy told this group, is "the end of the ILP road."  ILP, "instruction level parallelism" is a strategy to get a group of processors cooperating to solve a problem. Instead of moving the computation forward one step at each tick of the clock, two, four or six or more processors work in parallel. If a computation requires solving A+D=R and B+C=S, both steps can be done simultaneously to more quickly arrive at R+S=T.  

Most examples are more complex than this. Programs branch off in different directions frequently depending on results — in typical programs, once every three or four instructions, with the main computation left hanging awaiting branch results.   "So the problem is finding enough instructions that can be executed in parallel."  

Remarkably ingenious "speculative" strategies have been devised by programmers that essentially guess at the way the problem should proceed, and then use the guesses to steer resources between branches.

For 20 years, Hennessy said, this strategy has worked brilliantly, creating faster and faster solutions. But now the latest generation of chips is starting to come up against an intrinsic ILP problem: the guesses remain guesses. "You just can't predict branching perfectly," Hennessy noted. "If you could, the program wouldn't need a branch."

The workaround is "fix-up" code that can go back and redirect when the guesses go wrong. But the cost for the fix is delay in executing the program — delay that ramifies as steps pile up waiting for answers that are needed to proceed, "losses at every stage," as Hennessy put it.

Hennessy said the bottom line can be read in the comparison between the raw computing power of the multiple processors and the results of running actual programs. At the present time, Hennessy noted, presenting graphs, the peak performance of machines — what they can achieve without branch-caused waits and prediction failure backtracking — "is three, five and even ten times higher than their actual ability to run programs," their sustained performance.

"There are more transistors," he said, "but they are less efficiently used."  Even worse, as the inefficiency goes down, the cost of producing the jam-packed hyperchips goes up.  

Hennessy enthusiastically illustrated his point with an Intel case study, the contrast between the company's Pentium III and Pentium 4 processors. Compared to its predecessor, Hennessy noted, the 4 represents a 3-fold increase in power, and a 4-fold increase in the number of transistors. But most programs don’t run faster on the newer chip.

"It's not," Hennessy deadpanned, "that Intel didn't know what it was doing" when it created the 4.  The problem he said is typical. "People aren't finding more efficient ways to use the power budget or the transistor budget." And, he said, "there are really no tradeoffs left to make. The harder we try to push it, the more inefficient it becomes. Speculation [the program-guessing formulas] just make it worse."

Hennessy continued into a discussion of the very latest generation of chips, noting that all demonstrated the problem. He did find some hope in one radically new chip, the Sun Niagara, which offered a drastic simplification of the problem — in some ways, a return to the 80s, he said, grouping 32 processors around 4 cores, but cutting back on speculation.  The result is that at any given time, many of the processors are hung up for one reason or another — but the 4-core redundancy is able to deliver more of parallelism's benefits, outperforming more traditional designs.

 Hennessy said it's not clear how much farther this idea can go, "but if I were IBM or Intel now and I looked at Niagara, I'd be worried. I don't think there will be another single-core architecture."

For the future, Hennessy indicated that some intellectual reordering of the chicken/egg hardware/software relationship might be coming up. For decades, he noted, "Programmers have ignored the details of how the machines work," developing software to be compiled into varying machines indifferently. "The machines needed to be built as targets of compilers."

The way around the stall, he suggested, might come from backing up and finding another way. Which would not be easy.

While 'how' is uncertain, Hennessy was clear about where. He noted that despite company investments, the basic ideas that have driven computer science and design have come, and, he thinks, will continue to come from the universities.  "And we need to continue engagement between the research and corporate communities."

After his talk Hennessy offered a series of observations, including:

  • It's at least possible that a changeover from silicon to next-generation nanotube, quantum, or other computing media might offer some ways around the problem. "At the very least, any changeover from silicon will lead to a re-examination of design," perhaps at least possibly because the familiar silicon meanings of 1 and 0 may be transformed.
  • The flow of government research dollars into research remains critical to the industry and the economy as a whole, which makes it alarming that it is now a 20-year trend that R&D expenditures have been declining as a percentage of GDP. He referred his listeners to the recent National Academy Report, "Rising Above the Gathering Storm, Energizing and Employing America for a Brighter Economic Future" which calls for a 4-part program to improve K-12 mathematics and science education; sustain and strengthen the nation's commitment to long-term basic research; develop, recruit, and retain top students, scientists, and engineers from both the U.S. and abroad; and ensure that the United States is the premier place in the world for innovation.
  • Future fixes may involve much more vigorous diversification of computer architecture for different uses, so desktop machines will run completely differently from servers — a trend that may be emphasized by a changing relationship between individual desktop computers and servers.



Home | About | Academics | Research | News | Giving | Prospective Students | Current Students | Alumni & Friends
Events Calendar | Search | Contact | Site Map
University of Southern California – Viterbi School of Engineering