People find it easy to pick up on verbal clues that a speaker has high blood alcohol – speech that sounds slowed, slurred, stumbling and often louder than usual. But can a computer learn to do the same?
Winners! Lead author Daniel Bone, left, joins Professor Shri Narayanan and fellow Ph.D. candidate Matthew Black on stage to accept their best paper award at the 2011 Interspeech Conference in Florence Italy.
The raw material used consisted of 39 hours of recorded utterances from 154 German volunteers, 77 male, 77 female, ranging in age from 21 to 75, first interviewed with high blood alcohol then two weeks later when sober. The researchers looked for generic similarities in drunken versus sober speech to develop a test to distinguish the two.
The teams had two months to do so. Then the challenge was to apply these tests to another set of utterances to determine their speakers' state of sobriety.
The six-person team from USC Viterbi Professor Shrikanth Narayanan’s SAIL came in on top, the second SAIL team to do so in the three-year history of Interspeech competitions. A previous SAIL team, which focused on determining emotion from speech samples, triumphed in the 2009 contest.
The speech samples were a mixed bag, some spontaneous speech, some readings of text material. That all were in German was not a problem, said Matthew Black, an electrical engineering Ph.D. candidate who co-authored the team's paper about the work – the techniques are language independent.
"If all the participants were saying the same thing," said Black, "it might have been easier." But the researchers did have sober and intoxicated speech from the same person to analyze.
Intoxication detectors: (from left) Ming Lee, Sungbok Lee, Shri Narayanan, Daniel Bone, Angeliki Metallinou and Matt Black.
The SAIL group’s approach fused a group of computer methods for analyzing speech into a multimodal system. The modes included spectral cues long used for speech recognition, including prosody, rhythm, intonation and pitch, and also voice quality cues such as hoarseness, creakiness, breathiness, nasality, quiver and other artifacts of air flow through human speech apparatus. Their work drew on work done before by other speech analysts, but did so in new ways. Ordinary computers were used for the computations.
“This winning approach relied on hierarchical organization of speech signal features, the use of novel speaker normalization techniques as well as the fusion of multiple classifier subsystems,” wrote Narayanan.
The SAIL software identified 70 percent of the unknown samples accurately, the highest in the competition and significantly higher than the previous best rate of approximately 65 percent.
Details of this research, and other ongoing SAIL efforts in human-centered signal processing and behavioral informatics, can be found at the SAIL website along with a link to the prizewinning paper, “Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors.”
The lead author on the paper was Ph.D. student Daniel Bone. In addition to Black and Narayanan, the other team members included Ph.D. students Ming Li, Angeliki Metallinou and research professor Sungbok Lee.
So, in the future will police ask drivers stopped on suspicion of DUI to speak a few words into a microphone instead of walking a straight line? "Not right away," says Bone, “but it is possible that in-car alcohol detection systems may incorporate speech-based technology in combination with other techniques."