Michele Masterson

ICSI Seeks To Unravel ASR Limitations

The International Computer Science Institute (ICSI) recently launched a new research project focused on exploring automatic speech recognition to understand the limitations and challenges from current technologies. Sponsored by the Intelligence Advanced Research Projects Activity via the Air Force Research Lab, the research aims to use its conclusions to lead to new methods for improving ASR technology.

Nelson Morgan

“The core algorithms behind the statistical approach that is at the heart of modern recognizers were developed around 35 years ago,” says Nelson Morgan, leader of speech research activity at ICSI. “It’s also true that there have been many refinements, but the basic assumptions that were adopted at the start still remain in place.”

The research project includes two major parts. The first is an in-depth look at the assumptions behind acoustic modeling, which is a key component of ASR that creates statistical representations of each of the distinctive sounds that make up words. This will enable ICSI researchers to discover technical challenges that prevent ASR from being more accurate.

“Speech recognition systems, while working reasonably well in some applications, still fail under many circumstances,” says Nelson. “Our lab and others work hard to try to improve the technology, but we decided a few years ago that it would be worthwhile to have a parallel effort in which we figure out where known deviations of real speech data from these basic assumptions are actually hurting us. We believe that results from such a study may help to guide us and others in the future to come up with significant improvements.”

The second part is a broad survey of 100 experts and colleagues in the field, asking for perceptions on where ASR technology is effective, where it fails, and what its shortcomings are. The study will include interviews with practitioners and a review of recent literature to derive community consensus on what approaches don’t work, and to develop guidelines for future analysis.

Steven Wegmann is serving as co-principal investigator of the research, overseeing the in-depth acoustic modeling phase. Co-principal investigator Jordan Cohen is heading the breadth field survey phase. Morgan is the principal investigator for the full research project. The one-year project is expected to be completed by March 2013.