June 22, 2005 —
Three years of work by a large interdisciplinary team at the University of Southern
California has created a rudimentary but working two-way voice translation system
that allows an English-speaking doctor to talk to a Persian-speaking patient.
|
Shrikanth Narayan leads the large, multidisciplinary team that is developing
the Transonics Spoken Dialog Translator,seen on the table. |
The Transonics Spoken Dialog Translator turns a doctor's spoken English questions
into spoken Persian, and translates patients' spoken Persian replies into spoken
English.
Shrikanth Narayanan leads the USC Viterbi School group that developed Transonics.
One member of this team presented a report on the system June 26 at the Association
for Computational Linguistics conference in Ann Arbor, Michigan.
"Fluent two-way machine voice translation is one of the holy grails of engineering,"
said Narayanan, an associate professor of electrical engineering, computer science
and linguistics at the USC Viterbi School of Engineering who directs the Speech
Analysis and Interpretation Laboratory (SAIL) in the Integrated
Media Systems Center.
"We are years away from perfecting it, but we think the choices we have made
about how to go about creating such a system are working. We hope to have something
that will be useful in emergency rooms or ambulances within two years or so."
The system that exists, funded by two DARPA grants totalling $3.8 million, is
a result of intensive research in information technology, critically supplemented
by careful observation of patient-doctor dynamics in numerous bilingual interaction
sessions staged for the project.
Narayanan noted that the Transonics approach relies not just on computer code,
but also on the ability of humans to use even imperfect tools. This approach,
he adds, grows directly out of the extraordinary difficulty of the technical problems
involved.
"Two-way voice translation involves combining at least three highly imperfect
existing disciplines, with the errors multiplying at every stage," Narayanan explained.
These include:
- Text translation. Taking a written text in one language, and translating it into another. Machine
translation systems developed by researchers Kevin Knight and Daniel Marcu at
the Viterbi School's Information Sciences Institute consistently rank among the
world's best — but still make frequent grammatical and other errors. Marcu and
Knight developed a specialized system specifically for use in Transonics.
- Spoken word recognition. This is Narayanan's specialty. Just being able to reliably recognize a large
number of different single words, in a variety of regional or foreign accents,
is a difficult problem that is far from solved, as anyone who has tried to use
existing telephone interfaces knows. Recognizing a wide variety of words informally
spoken in a noisy, chaotic environment (emergency room, ambulance) adds another
level of difficulty.
- Extra-verbal communication. Humans speak not just with words, but also with intonations. A rising tone at
the end of a sentence to express a question is one familiar example of this, one
that is extraordinarily difficult for a machine to assess. Nonsense syllables
("um, uh, ah, er"), catchphrases ("you know, like,") and exclamations (Wow! Hey!)
in utterances are easy for humans to decode or ignore, but major stumbling blocks
for machines. The insights of David Traum of the USC Institute for Creative Technologies
in dialog management are aiding in this area by narrowing the range of possibilities
and by bringing context and previous exchanges into the computer's decision-making.
Additionally, teaching computers to detect human emotions in speech is a major
focus work by researchers at the USC Speech Analysis and Interpretation Laboratory
under the direction of Narayanan and his colleague, USC research assistant professor
Panos Georgiou
The Transonics interface stretches the limits of technology by systematically
taking advantage of the fact that doctor-patient discourse is, by its nature,
highly structured, using a narrow set of concepts. "We can take advantage of using
essentially pre-fabricated sentences in many cases by trying to understand and
paraphrase what is being communicated instead of doing exact word for word translation,"
Narayanan says.
Additionally, the system uses the human ability to read text as a bridge over
some of the worst problems of speech recognition and machine translation, by allowing
users to select alternate possible messages.
|
Keypad speeds and simplifies frequently used functions. |
The Transonics system runs on a laptop computer using the Linux operating system.
Doctor and patient both wear headphones with attached microphones. A small keypad
connected to the computer speeds and simplifies certain routine commands — switching
from doctor mode to patient mode, for example.
When a doctor asks a question, the speech recognition software captures it —
but hedges its bets by displaying not just its best guess about what was said,
but a range of options. When the doctor chooses the most appropriate (some of
the most used phrases can be put in a quick access "ready menu") and the result
is a spoken Persian question in the earphones of the patient.
The same process takes place in the reverse direction.
Narayanan says much of the success of the interface grows directly out of analysis
of a large database of some 300 English-speaking-doctor/Persian-speaking-patient
dialogs created by USC medical students and Iranian-heritage USC students and
Los Angeles residents. "Rather than imagining what people might say, we analyzed
what people did say," he explained, adding that recordings of the encounters were
used to train and tune the system.
USC linguistics Ph.D. candidate Shadi Ganjavi played a vital role in setting
up these encounters, said Narayanan. "We are grateful to her and to the large
Persian-speaking community in Los Angeles."
The system contains about 23,000 English and 9,000 Persian words, a disproportion
that exists because relatively little has so far been done in machine translation
of Persian (a language also called Farsi), either written or spoken. "In addition
to our progress in the general problem of the interface," says Narayanan, "we
are also contributing to the specific problems posed by translating between English
and Farsi."
|
Transonics Interface: It offers options of messages to translate, based on the
words it has heard.
|
For Narayanan, one of the striking things that has emerged so far is the dependence
of the system, in its current state, on the ability of users to recognize its
limits and weaknesses, and work within them.
The team has created an elaborate user manual, and as with any system, reading
the manual improves performance a great deal. And common sense is critical. Narayanan
ruefully describes an interaction labeled a failure in followup questioning by
both 'doctor' and 'patient' that foundered because both expected the system to
translate the name "Excedrin."
The drug name wasn't in the system. It's the same in both languages, and both
sides of the interaction understood it when they heard the other pronounce the
word. But rather than just moving on, both stubbornly kept trying to enter it
into the system — which kept rejecting it.
"We learn from things like this," said Narayanan. He and his colleague Georgiou
estimate that if the system were tagged with the familiar release number decimal
system, the system would be at "three point something" — it has gone through three
radical reconstructions in its three years of development so far.
Transonics interface displays a possible message or messages captured from doctor's
speech. The doctor can choose the one he wants, and the machine will pronounce
a Persian translation. Right hand column stores heavily used questions.
The system is in a continuing process of upgrading and improvement. Simultaneously
with the presentation at the ACL conference, use testing was in process at a military
facility.
In addition to the researchers and institutions already named, Malibu California-based
HRL Laboratories works with USC on the project. HRL personnel involved include
USC alumni Dr. Robert Belvin and Howard Neely, and Cheryl Hein. Usability testing
and interface design contributions were made by Scott Millward, a postdoctoral
scientist at IMSC. Additionally, four USC electrical engineering graduate students
have made large contributions: Emil Ettellaie, Dagen Wang, Ananthakrishnan Shankar,
Murtaza Bulut, and Sudeep Ghande, the presenter of the paper at ACL.