Computer chip manufacturers traditionally have had a single, simple standard
for
their product: perfection. But a USC engineer who has spent his career devising
ways to have chips test themselves has found that less than perfect is sometimes good
enough — possibly good enough to save billions of dollars.
"Chips with any flaws at all have always been discarded," said Melvin A. Breuer,
a professor in the University of Southern California Viterbi School of Engineering's
Department of Electrical Engineering. "And this significantly increases the cost
for the good ones."
When manufacturers start making a complex chip, a very large percentage are faulty,
Breuer explained. The percentage goes down as manufacturing techniques improve,
he added, but "by the time the technique is thoroughly mastered, the chip is on
its way to being obsolete."
Some chip designers try to cut the losses by designing redundancy into the circuits,
so that when circuitry fails, other
|
Professor Melvin Breuer holds a tray of defective chips, part of a batch of 1000
donated by a manufacturer. The specially configured computer behind him allows
the chips to be use-tested without being soldered into a board. Breuer is devising
new test algorithms that will be able to identify potentially usable defective
chips accurately. |
circuitry can take its place. Even with these measures (and they have costs),
large numbers of chips wind up as extremely expensive industrial waste.
Traditionally, the wastage — often half the output or more — is written off as
a business cost. But are all faulty chips useless? Seven years ago, Breuer and
Viterbi School colleague S. K. Gupta began investigating the idea of acceptable
errors produced by defective chips.
For some applications — security, and accounting and scientific application —
errors are intolerable, says Breuer. But for many others, including graphics,
there is a surprising amount of leeway for "error tolerance."
"If you have an application where the end user is a person, rather than another
computer, small changes in the output are imperceptible, " says the researcher,
giving as an example images created by a chip with a few defects, in which one
or two pixels were out of place.
The critical factor, Breuer says, was being able to cost-efficiently test and
accurately predict if a defective chip will provide acceptable performance without
having to plug it into the application. Breuer and Gupta have developed simple
built-in test structures for chips that can automatically determine attributes
regarding their erroneous performance, such as error rate and significance.
Breuer specializes in problems like these: He is the author of several books
on the subject (including
Diagnosis and Reliable Design of Digital Systems and
Digital System Testing and Testable Design); and is on the editorial board of the
Journal of Electronic Testing.
In a 2004 paper in
IEEE Design and Test Magazine, Breuer, Gupta, and Intel Corp. Senior Staff Engineer T.M. Mak were able to
set forth a framework to analyze errors and predict usability. One such analysis
indicated that 60 percent of chips with a single defect would nevertheless be
able to decode MPEG video files and play them back with no user-noticeable errors.
Because of this and other work, the National Science Foundation recently awarded
$1.1 million to Breuer, Gupta and two other Viterbi School researchers, Antonio
Ortega and Keith Chugg, to investigate and develop error tolerance. Breuer and
Gupta have also received funding for this work from the Semiconductor Research
Corporation, and Breuer has received addition funding from the Okawa Foundation.
Ortega and his students in the Signal and Image Processing Institute within the
USC Viterbi School have already created simulations of images produced by flawed
chips implementing JPEG and MPEG encoding operations, and the results confirm
that a significant fraction of flawed chips result in slightly degraded performance
that is unrecognized by the viewer. This group is also looking into additional
applications for imperfect hardware.
Chugg and his students in the USC Viterbi School Communication
Sciences Institute have demonstrated that turbo decoding chips, which
are being adopted for next generation wireless communication systems,
are very robust to circuit defects. In fact, such chips can have
a significant number of defects in the memory circuitry with little or
no perceptible degradation in performance.
Industry is also starting to prick up its ears, says Breuer. "When I first started
talking to them," he recalls, "they were very negative. 'We don't want our name
associated in any way with defective product,' was their response." But their
attitude seems to be changing, Breuer says that over the last 12 months he has
been invited to give “keynote” talks at three conferences on the subject of error
tolerance.
“If these ideas catch on, we will see a major paradigm shift in the way chips
are designed, tested and marketed. And these ideas will allow industry to continue
to scale technology according to Moore’s law, while reducing the cost of chips
to the end user,” Breuer notes. He adds that "considering that the net revenues
of chips sold in 2004 was over $210 billion, the annual economic impact of these
ideas could easily amount to billions of dollars."
Mak, Breuer’s co-author on the 2004 paper, admits he was skeptical at first,
with skepticism growing out of earlier experience, with chips combining two functions.
“If one of the elements of the chip didn’t work, we thought, we could still use
the other.”
But Mak said this created logistical problems because the supply of half-usable
chips was so unpredictable. He ended up shipping many chips that had no defects
to customers who were paying a lower price for the imperfect chips.
However, he said, ss trends
in chip manufacturing led to denser and denser architectures producing more and
more defects, it is becaming increasingly difficult to ship chips, (or even half-chips)
that meet the perfection
standard.