
|
John
Hennessy, with former student Timothy Pinkston in the background. "You just can't predict
branching perfectly. If you could, the program wouldn't need a branch.
|
Moore's Law has hit a slippery
patch, Stanford University's charismatic electrical engineer, computer scientist and president
told a full house at the fourth Viterbi centennial lecture Feb. 16. The
denser and denser chips, packed with faster and faster transistors are
coming on line as predicted — but it's becoming harder and harder to
make efficient use of them for computations, as opposed to simple rote
memory repositories. The problem is fundamental, embedded in the
way programmers structure computational tasks, and John Hennessy said
the answer looks like it will have to come from the software side
— and very likely from academia, not industry.
Timothy Pinkston, professor of electrical engineering, (PhD Stanford,
'93) introduced his former
professor, hitting the highlights of one of the most impressive resumes
in
American academia, one that in addition to leadership of Stanford
includes gilt-edged research (establishing the parameters of
now-standard RISC chips) entrepreneurship (co-founder of MIPS
Technology) honors (membership in the National Academy of Engineering,
the National Academy of Sciences, the American Association for the
Advancement of Science plus Von Neumann
and Lamme Medals, and Mauchly Award and Cray Awards).
In an energetic and clear fashion, Hennessy began with a historical explanation of
the problem as it has emerged and developed over the past 15 years to
an audience heavily drawn from the Viterbi School computer science
department, which sponsored the talk, with numerous researchers from
the Information Sciences Institute in Marina del Rey making the
pilgrimage to campus.
What the computer industry is looking at, Hennessy told this group, is
"the end of the ILP road." ILP, "instruction level parallelism"
is a strategy to get a group of processors cooperating to solve a
problem. Instead of moving the computation forward one step at each
tick of the clock, two, four or six or more processors work in
parallel. If a computation requires solving A+D=R and B+C=S, both steps
can be done simultaneously to more quickly arrive at R+S=T.
Most examples are more complex than this. Programs branch off in
different directions frequently depending on results — in typical
programs, once every three or four instructions, with the main
computation left hanging awaiting branch results. "So the
problem is finding enough instructions that can be executed in
parallel."
Remarkably ingenious "speculative" strategies have been devised by
programmers that essentially guess at the way the problem should
proceed, and then use the guesses to steer resources between branches.
For 20 years, Hennessy said, this strategy has worked brilliantly,
creating faster and faster solutions. But now the latest generation of
chips is starting to come up against an intrinsic ILP problem: the
guesses remain guesses. "You just can't predict branching perfectly,"
Hennessy noted. "If you could, the program wouldn't need a branch."
The workaround is "fix-up" code that can go back and redirect when the
guesses go wrong. But the cost for the fix is delay in executing the
program — delay that ramifies as steps pile up waiting for answers that
are needed to proceed, "losses at every stage," as Hennessy put it.
Hennessy said the bottom line can be read in the comparison between the
raw computing power of the multiple processors and the results of
running actual programs. At the present time, Hennessy noted,
presenting graphs, the peak performance of machines — what they can
achieve without branch-caused waits and prediction failure backtracking
— "is three, five and even ten times higher than their actual ability
to run programs," their sustained performance.
"There are more transistors," he said, "but they are less efficiently
used." Even worse, as the inefficiency goes down, the cost of
producing the jam-packed hyperchips goes up.
Hennessy enthusiastically illustrated his point with an Intel case
study, the contrast between the company's Pentium III and Pentium 4
processors. Compared to its predecessor, Hennessy noted, the 4
represents a 3-fold increase in power, and a 4-fold increase in the
number of transistors. But most programs don’t run faster on the newer
chip.
"It's not," Hennessy deadpanned, "that Intel didn't know what it was
doing" when it created the 4. The problem he said is typical.
"People aren't finding more efficient ways to use the power budget or
the transistor budget." And, he said, "there are really no tradeoffs
left to make. The harder we try to push it, the more inefficient it
becomes. Speculation [the program-guessing formulas] just make it
worse."
Hennessy continued into a discussion of the very latest generation of
chips, noting that all demonstrated the problem. He did find some hope
in one radically new chip, the Sun Niagara, which offered a drastic
simplification of the problem — in some ways, a return to the 80s, he
said, grouping 32 processors around 4 cores, but cutting back on
speculation. The result is that at any given time, many of the
processors are hung up for one reason or another — but the 4-core
redundancy is able to deliver more of parallelism's benefits,
outperforming more traditional designs.
Hennessy said it's not clear how much farther this idea can go,
"but if I were IBM or Intel now and I looked at Niagara, I'd be
worried. I don't think there will be another single-core architecture."
For the future, Hennessy indicated that some intellectual reordering of
the chicken/egg hardware/software relationship might be coming up. For
decades, he noted, "Programmers have ignored the details of how the
machines work," developing software to be compiled into varying
machines indifferently. "The machines needed to be built as targets of
compilers."
The way around the stall, he suggested, might come from backing up and finding another way. Which would not be easy.
While 'how' is uncertain, Hennessy was clear about where. He noted that
despite company investments, the basic ideas that have driven
computer science and design have come, and, he thinks, will continue to
come from the universities. "And we need to continue engagement
between the research and corporate communities."
After his talk Hennessy offered a series of observations, including:
- It's at least possible that a changeover from silicon to
next-generation nanotube, quantum, or other computing media might offer
some ways around the problem. "At the very least, any changeover from
silicon will lead to a re-examination of design," perhaps at least
possibly because the familiar silicon meanings of 1 and 0 may be
transformed.
- The flow of government research dollars into research remains
critical to the industry and the economy as a whole, which makes it
alarming that it is now a 20-year trend that R&D expenditures have
been declining as a percentage of GDP. He referred his listeners to the
recent National Academy Report, "Rising Above the Gathering Storm,
Energizing and Employing America for a Brighter Economic Future" which
calls for a 4-part program to improve K-12 mathematics and science
education; sustain and strengthen the nation's commitment to long-term
basic research; develop, recruit, and retain top students, scientists,
and engineers from both the U.S. and abroad; and ensure that the United
States is the premier place in the world for innovation.
- Future fixes may involve much more vigorous diversification of
computer architecture for different uses, so desktop machines will run
completely differently from servers — a trend that may be emphasized by
a changing relationship between individual desktop computers and
servers.