USC - Viterbi School of Engineering - Engineers Use Magnetic Resonance Imaging for Linguistic Studies

Imaging the soft tissues of the vocal tract in real-time could explain some mysteries of human speech production

Claudia Melendez
March 17, 2009 —

How does your tongue move exactly when you utter a simple “Hello”? Does the tip move faster than the base or vice versa? Does the movement vary if you’re not a native English speaker? If you’re angry, sad or ebullient? Would it move differently if you had suffered a stroke or had any other form of cerebral damage?

Shri Narayanan, left, supervises Ph.D. students Erik Bresch (center looking down at subject) and Yoon-Chul Kim (right with back to camera), who are preparing a subject for MRI imaging. (Photo/Claudia Melendez)

These are some of the questions that Shrikanth Narayanan, director of the Viterbi School’s Signal Analysis and Interpretation Laboratory (SAIL), is trying to answer. In search of responses, Narayanan and a multi-disciplinary team of researchers have, for the past several years, refined the use of real-time magnetic resonance imaging, and an innovative use of the technology that allows them to go where no researchers have gone before, thanks to funding from the National Institutes of Health.

“It’s very hard to look inside the body to see speech articulation; it’s a very hostile environment," said linguistics professor Dani Byrd, a member of the research team looking into the mysteries of speech production. “It’s dark, it’s wet, it’s somewhat salty, things move very, very fast, they whack into other things, so a fast-moving tongue can hit up against the palate with quite a lot of force or our two lips can come together with quiet a lot of force, and almost none of these events are externally visible.”

“With this technology we can see what were the events that took place in the human body which shaped the vocal fold vibration … in a way that created the speech output,” Byrd said. “We can see the events that cause the speech waveform to have the properties that we see now.”

Speaking.

To peek into the human vocal tract, Narayanan and his team have developed software and hardware that allow them to record body movements when subjects are inside an MRI machine reading out loud a series of prepared sentences or while talking to someone else outside the scanner.

During a recent scanning session at the USC Imaging Science Center, a German speaker read out loud a series of statements, first in his native language and then in English. Outside, Narayanan and Ph.D. student Yoon-Chul Kim monitored the movement of the subjects’ vocal tract, which could be seen in real-time on a computer monitor.

On a tiny square of the monitor, the movement of the tongue, the lips and the velum can be seen just as the volunteer inside the loud MRI machine spoke. Ph.D. student Erik Bresch tracked the sound, making sure the loud thumping of the scanner did not clutter the sentences being uttered by the subject.

Yoon-Chul Kim, right, monitors data from real-time speech with project co-investigator Shri Narayanan. (Image/Claudia Melendez)

“It’s a major accomplishment technically,” Bresch said, referring to the contraption he had put together to capture the vocal utterances and diminish the MRI acoustic noise.

The USC team is the first in the nation to use this technology for linguistic research, and even though it has its drawbacks, its use has proven far superior to other tools.

“One of the powerful aspects of our approach is that MRI provides a full picture of the position of soft tissue while speech sounds are being produced,” said Krishna Nayak, assistant professor in the Ming Hsieh Department of Electrical Engineering and a member of the research team. “Compare this with one of the pre-existing modalities to study real-time speech, ultrasound, which only lets you see the tongue. To really understand the shaping of the vocal tract, you need to see both sides. In fact, you need to see all three dimensions.”

Linguists have also used electro-magnetometry, a method that yields high temporal resolution by tracking a few sensors placed on key speech articulators, but that has the potential not only to produce distorted results because the sensors are placed on the tongue but provides only a partial view of the front part of the vocal tract.

“That’s why real-time MRI is such a powerful technique,” Nayak said.

Real-time MRI in its present form does have its limitations. First, the volunteers have to be lying down supine when the pictures are being taken, an unnatural position for day-to-day communication. Second, the imaging still doesn’t have the spatial and temporal resolution that the researchers would like.

Shri Narayanan, left; Krishna Nayak, right.

“We’d like to image even faster than we do right now because there are certain sounds that require very rapid motion of the tongue tip and of the lips, … the human vocal tract is very amazing,” Nayak said.

With a grant from the National Science Foundation, Narayanan and his team are now taking the research to new directions: to explore other aspects of human vocal production, like singing and acting, using the technology they’ve worked so hard to put together.

“We’ve established quite a bit of foundation for this, now we’re excited to go to the next stage,” Narayanan said.

For more info, visit http://sail.usc.edu/span/.