USC - Viterbi School of Engineering - Viterbi Voice for Kids Interface Wins IEEE Signals Prize

Three-part project teaches machines to understand children's speech

February 02, 2006 —

Creating a system that lets children talk to computers instead of using of conventional mouse or keyboard controls won the 2005 IEEE Signal Processing Society "Best Paper" Award for the USC Viterbi School of Engineering's Shrikanth Narayanan and a collaborator.

Agent Chimp understands children — or at least some of the things they say

Narayanan, who holds appointments in the Viterbi School's departments of electrical engineering and computer science, as well as the USC college department of linguistics, based the paper on research done in 2000-01 with his co-author, Alexandros Potamianos of the University of Crete.

The award will be presented at the annual meeting of the 16,000-member society, to be held this year May 14-19 in Toulouse, France.

The paper, "Creating Conversational Interfaces for Children," addresses three separate problems.

First, it describes at length the particular problems of creating systems that can recognize children's speech, which is acoustically quite distinct from that of adults. Children also have much wider variation in their pronunciation of words than do adults, creating additional difficulties.

The bottom line was that standard methods for Automatic Speech Recognition (ASR) did four times worse on children's speech than adults. However, special adjustments made by Narayanan and Potamianos were able to bridge the gap and bring the error rate down into the standard adult ranges.

But is voice control a useful and effective technique for children? The next part of the study was a controlled "Wizard of Oz" setup in which children played a well-known educational game (Where in the USA is Carmen Sandiego). Half of the children used the standard mouse and keyboard techniques. The other half spoke their commands and choices, which an unseen human observer ("the Wizard) then executed.

Quizzed afterward on how they liked playing using voice versus mouse, an overwhelming number loved it -- "Ninety-three percent rated the interface 4 or 5."

The final element in the paper describes how the researchers built an interface for a simple game using ASR. The prototype was a program that prompts children to play a spelling game, while also casually interacting with them and offering praise. The character was Agent Chimp, and while the game was elementary, it was effective in holding the attention of the eight small childen ( ages 8-14) who played.

Narayanan: "These ideas will be used in some of the advanced virtual learning environments that we are trying to create presently at USC." (photo Abigail Kaun)

"Overall, the prototype represents a successful first effort at building a multimodal system for children with an emphasis on conversational speech," concluded the authors at the time." We expect the data from such prototypes will help further conversational human-machine interaction."

In fact, according to Narayanan, this is happening: "Some of the work in this paper serves as a basis for a current projects on automated literacy assessment [for young children] funded by NSF, and we are hoping that some of these ideas will be used in some of the advanced virtual learning environments that we are trying to create presently at USC," he said.

NSF funded the research described in the paper.

The "Best Paper" honor is only the latest distinction for Narayanan, who was named a fellow of the Acoustical Society of America in November 2005. A lengthening page of media stories chronicles the youthful investigator's efforts in such fields as voice-to-voice speech translation, laughter analysis and production, and answering system devices that detect irritation in callers voices.